Most companies deploying web sites or software use pre-production environments in order to help test recent changes before they end up in front of users. While this leads to many benefits in terms of adding lines of defence to find problems and bugs, it can also increase cost and complexity, and have the opposite effect. With techniques such as Continuous Delivery encouraging teams to make sure that software is always deployable, there is movement away from complicated branching structures and testing environments, and momentum towards simpler setups. Squeaky - a company which helps businesses to understand how visitors are using their website or web app without invading their privacy - have taken a different approach, and have outlined why they don’t use a staging environment. They believe that this helps them to ship faster, and lower the number of issues found in production.
In a blog post describing their approach, Lewis Monteith from Squeaky details several problems they found with staging environments:
Pre-live environments are never at parity with production: a scalable cloud-native app will often require many more resources in production to deal with load - but the cost of building exactly the same setup for pre-live is prohibitive. This leads to configuration drift and scaled-down architecture in pre-live environments which can preclude testing in these pre-live environments from finding issues.
There’s always a queue, which makes releases larger and reduces ownership: Pre-live environments can be a bottleneck if multiple developers or teams want to release code at the same time. Having to wait for the environment causes delays and compromises in testing, especially if tests fail and everyone has to wait for them to be fixed. Having a queue of releases causes branch divergence, which then causes pain for developers when a large number of changes need to be merged later.
To reduce the queue, this then leads to releases being bundled together which means it’s more likely that bugs are introduced, and for it to be difficult to track down exactly which change and whose change caused the problem, as issues are isolated and developers may not realise changes have gone to production.
Process replaces accountability: pre-live environments tend to be run by ops-focused teams, thus deploying to a pre-live environment can come with an implicit handover of responsibility from developers to ops teams.
Squeaky’s alternative approach aims to resolve or avoid these issues - with four key tenets to make this work.
Only merge code that is ready to go live: this approach is backed up by making sure there are appropriate tests, and the changes have been validated in development.
A flat branching strategy: all branches are forked from the main branch, and changes are only ever merged back there. Smoke testing happens locally on a developer’s computer.
Feature flagging for high-risk changes: Squeaky may ship significant changes behind a feature flag - if they are at all concerned about performance under load or how users may react to a change. They have the ability to do this on a per-user basis.
Hands-on deployments: monitoring, logging and alarms are used comprehensively to ensure there’s no issues, and Squeaky also use blue/green deployments to deploy changes to a subset of users until they are sure everything is OK.
Squeaky’s dropping of a staging environment in favour of many continuous delivery principles has changed the mindset towards shipping software. Removing the buffer for changes before they go live requires upping confidence levels that changes are fit for production. This in turn leads to reduced costs and complexity and has helped speed up their development lifecycle.