BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News How Removing Staging Environments Can Improve Your Deployments

How Removing Staging Environments Can Improve Your Deployments

This item in japanese

Bookmarks

Most companies deploying web sites or software use pre-production environments in order to help test recent changes before they end up in front of users. While this leads to many benefits in terms of adding lines of defence to find problems and bugs, it can also increase cost and complexity, and have the opposite effect. With techniques such as Continuous Delivery encouraging teams to make sure that software is always deployable, there is movement away from complicated branching structures and testing environments, and momentum towards simpler setups. Squeaky - a company which helps businesses to understand how visitors are using their website or web app without invading their privacy - have taken a different approach, and have outlined why they don’t use a staging environment. They believe that this helps them to ship faster, and lower the number of issues found in production.

In a blog post describing their approach, Lewis Monteith from Squeaky details several problems they found with staging environments:

Pre-live environments are never at parity with production: a scalable cloud-native app will often require many more resources in production to deal with load - but the cost of building exactly the same setup for pre-live is prohibitive. This leads to configuration drift and scaled-down architecture in pre-live environments which can preclude testing in these pre-live environments from finding issues.

There’s always a queue, which makes releases larger and reduces ownership: Pre-live environments can be a bottleneck if multiple developers or teams want to release code at the same time. Having to wait for the environment causes delays and compromises in testing, especially if tests fail and everyone has to wait for them to be fixed. Having a queue of releases causes branch divergence, which then causes pain for developers when a large number of changes need to be merged later.

To reduce the queue, this then leads to releases being bundled together which means it’s more likely that bugs are introduced, and for it to be difficult to track down exactly which change and whose change caused the problem, as issues are isolated and developers may not realise changes have gone to production.

Process replaces accountability: pre-live environments tend to be run by ops-focused teams, thus deploying to a pre-live environment can come with an implicit handover of responsibility from developers to ops teams.

Squeaky’s alternative approach aims to resolve or avoid these issues - with four key tenets to make this work.

Only merge code that is ready to go live: this approach is backed up by making sure there are appropriate tests, and the changes have been validated in development.

A flat branching strategy: all branches are forked from the main branch, and changes are only ever merged back there. Smoke testing happens locally on a developer’s computer.

Feature flagging for high-risk changes: Squeaky may ship significant changes behind a feature flag - if they are at all concerned about performance under load or how users may react to a change. They have the ability to do this on a per-user basis.

Hands-on deployments: monitoring, logging and alarms are used comprehensively to ensure there’s no issues, and Squeaky also use blue/green deployments to deploy changes to a subset of users until they are sure everything is OK.

Squeaky’s dropping of a staging environment in favour of many continuous delivery principles has changed the mindset towards shipping software.  Removing the buffer for changes before they go live requires upping confidence levels that changes are fit for production.  This in turn leads to reduced costs and complexity and has helped speed up their development lifecycle.

 

About the Author

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • My Single Rule - and grounds for immediate termination

    by Kelvin Meeks,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Over 20 years ago, as the CTO for a start-up - I had a single rule, displayed in a frame on the wall behind my desk.

    "Thou Shalt Not Deploy To Production - without having first deployed Staging".

    It was cause for immediate termination, then - and still is today.

  • Flying without a safety net

    by James Washington,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I have to agree with Kevin on this. What the article doesn't address is use cases where commonly used infrastructure or software is updated or a deploy script is modified. Without a staging environment, these changes go straight to production. When your customers suffer a massive business disruption or lost data and are asking "What happened?", I'm sure they will understand the wisdom of not testing in a staging environment.

  • Re: Flying without a safety net

    by Matt Saunders,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Yes - this is an interesting angle - but one that I believe has become easier in recent years with the advances in automation of cloud infrastructure. I would say that the same principles hold true - you need an environment to test out cloud infrastructure changes and this can be harder to achieve than just doing them in staging. But the same problems with staging persist. I'd encourage people to look at short-lived environments taking the place of staging environments and to strive towards Continuous Delivery for infrastructure components too.

  • Re: My Single Rule - and grounds for immediate termination

    by Matt Saunders,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I don't think that single rule works so well any more, for all the reasons stated in the article.

    No-one is saying that you shouldn't test things appropriately before deploying to production. The tech has advanced so that it's possible to do this in a short-lived and scalable way without the problems listed in the article; and it's this that I'm advocating for.

    I'd argue that updating your rule to say 'without having first deployed to an appropriate environment to fully test' -- though you might need to decrease the font size slightly :)

  • There is more to "ditch staging" idea

    by Maksim Mozajev,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I believe, the Squeaky is not stupid, so there is more to "ditch staging" idea.

    What this article fail to emphasize is that if you are running a microservices based architecture, this is very hard to test individual microservices. Especially in the high load environment.

    Integrated tests are very expensive and slow to run. You need: your microservice, all related microservices, test databases for everyone, plus all other dependencies. Imagine you have done all your tests and then adjacent team makes change to related microservice that can potentially break everything. You have run tests again, at least.

    What they are most probably doing is just releasing new stuff to the limited number of users (as stated) and use them as 'free' testers. They monitor version behavior closely, watching for error rates, performance etc. I the app does not fail and there are now complains from the users, the version goes live to a wider audience.

    However, this is not a universal way of doing it. It won't work for many scenarios simply because the business is different.

    Moreover, relying on a developers to thoroughly test what they release is a management dream that seldom comes true. Dev just don't like to test.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT