Patterns for Continuous Delivery
Once you have the build servers and script necessary for continuous integration, the next question is always, “So what do we do with the builds?” Continuous Delivery, the automatic or semi-automatic promotion of builds from one environment to the next, is often the next step in a company’s evolution.
Continuous Delivery can be applied to companies of any size, but the exact process is going to differ widely from company to company. Clearly the needs of a four-man team that handles everything is going to be different from that of a large, multi-team company with formal QA and a well-equipped production support department. Rather than try to make a one-size-fits-all solution, this white paper will cover a variety of scenarios and options.
Choosing a Continuous Delivery Toolset
Choosing a toolset for continuous delivery is the single least important decision that you’ll make. Once you develop the work flow for your company, you simply have to pick a tool that matches it. Given that it can take a while to setup and configure something, it is not unreasonable to spend a couple of days building your own custom tools.
More importantly, there is little risk of lock-in. Unlike choosing a source control system, you can have as many continuous delivery tools as you want and you can freely switch between them. It is not unheard for a QA team to use one tool to pull builds from development and another to push them to staging.
The Baseline Scenario
In the baseline scenario we are going to look at patterns for companies with limited resources. By this, I mean an IT department consisting of three to four individuals acting as both developers and administrators. Teams of this size can typically be found supporting small to midsize businesses, especially if the company isn’t focused on technology itself. Large companies may also structure their staff this way, breaking them into small, mostly independent groups with no interaction with other groups.
Before you attempt to employ continuous delivery you need a few prerequisites setup. First and foremost is a source control system with a matching build server. This first build server will also be your continuous integration server. It is the one that ensures that every single check-in can be built. Generally speaking you are going to want an “off the shelf” build server for this role. Building something by hand to monitor check-ins and automatically initiate builds is usually much harder than it sounds. Even if your source control system has triggers you can hook into, building out the other features such as broken build notifications isn’t worth the effort.
Even in a limited resource shop a staging server is essential for continuous delivery. The staging server should mimic one’s production environment as closely as possible. The first question is always, “What is your budget”. If you have a hundred-thousand dollar database server in production, chances are you aren’t going to be able to afford the same for your staging environment. And perhaps you don’t want to.
When trying to mimic a production environment a common mistake is to match the hardware too closely. Let’s say your production environment can handle 100 requests per second. If you bought the same hardware for your staging environment, but only executed a couple of requests per second while testing, your results would be skewed. Ideally you would also purchase and setup load servers to simulate production requests, but that is quite expensive and time consuming. Often a better strategy for teams of this size is to simply reduce the power of ones staging hardware.
Another requirement that is often overlooked is the versioning of builds. Each build must have a unique way of identifying it as something distinct from all other builds. If you are using a single public branch a simple timestamp or auto-incrementing version number would be sufficient. More complex scenarios will be discussed later.
With all the pieces outlined above your environment should look something like this:
(Click on the image to enlarge it)
With a build server in place to compile the code and the staging server standing by to receive it, the next step is to determine your deployment strategy. With a small team there are two basic delivery strategies to choose from: “Deliver on Check-in” or “Deliver on Timer”.
Deliver on Check-in
The Deliver on Check-in strategy has the advantage of instant gratification. Depending on the size of the code base there could be as few as one to two minutes between checking in a new feature and being able to test it on the staging server.
A major downside of this model is that it tends to make the staging server unstable. On many occasions I have seen someone try to test a feature when suddenly a new version is pushed to the staging server, spoiling the test run. Even worse, the staging server often doubles as a demo server, leaving the possibility for mishaps during important presentations.
Another problem with this model is code churn. For example, if three check-ins are performed in quick succession then three build/delivery cycles may be triggered when really only the last one is necessary. In extreme situations this may cause so much churn that the staging environment is never up long enough to be useful. Fortunately most build servers have the option to delay the start of a build or avoid building more than once in a given time interval.
Deliver on Timer
Time based delivery strategies are far more predictable. Everyone knows exactly when the deliveries are going to be initiated and can plan to check in their code before or after that time as necessary. It is typical to do a build/delivery cycle once or twice a day.
The downside of a daily build is that it can introduce unnecessary stress into the work environment. Developers may find themselves rushing to finish a check-in before the cutoff time for the build. Scheduling the build in the middle of the night when the developers aren’t expected to be working can minimize this stress, but it also means that they cannot perform second level testing until the next day.
When we start using terms like “daily build” it is easy to forget we aren’t actually talking about builds. Rather we are talking about full build/delivery cycles. With continuous integration you know almost immediately when a build breaks. From there it is a simply matter to fix or rollback the errant change so that it will be ready once the scheduled delivery occurs.
Deploying to Production
This scenario assumes there really isn’t anyone available to perform intensive testing against any particular build. Once it is spot-checked on the staging server the build would normally be promoted directly into production. But even here there are several options with trade-offs.
Promote From Staging
A popular option is to promote the verified build directly from the staging machine. This has certain advantages such as a high degree of certainty. In theory there is no possibility of testing a build and then deploying a different build by mistake. It is also easy to write a shell script that copies the files from staging to production.
But as with many things, the theory doesn’t always match reality. With automated pushes hitting the staging server it is quite easy to finish testing, grab a cup of coffee, and then come back to an entirely different build. Or worse, the build agent can start overwriting files on the staging server while they are simultaneously being pushed to the production server.
Promote From the Build Server
A much better option is to promote the build directly from the build server. This eliminates many of the timing conflicts I mentioned in the “from staging” option. Plus it makes one more conscious of the specific build being pushed.
This too is not without its downside; one can easily select the wrong version to push into production.
Rebuild and Push
A third option is to create a new build option for your continuous integration server that includes a push to production step. I caution against this option. While in theory the rebuilt code has exactly the same payload as the one you tested, it adds another opportunity for something to go wrong.
Another problem with this option is that it allows you to choose different compiler options. When you have that option you may be tempted to use a debug build on staging and a release build in production. This can have disastrous consequences if there are behavioral changes between the two build configurations.
Though easy to setup with continuous integration servers, again, I highly recommend that one does not choose this option.
Scenario 2: Adding QA
Once QA gets involved the situation becomes much more complicated. Since you now have to deal with cross-team communication and scheduling, you’ll need more environments and a more meaningful sense of ownership.
Once QA is in the picture you will probably want at least three non-production servers. Code changes are promoted from each server to the next based on certain criteria.
(Click on the image to enlarge it)
This is the first place the build hits after a feature is checked in. It is called the integration environment because it is where the feature should be fully integrated with all other features. It is used for spot-checking a build to see if it is suitable for offering to QA. Instability is allowed in this environment, but should not be allowed to linger.
This is where the QA team does most of its work. This is updated from the integration environment on an as-needed basis.
The staging server now used exclusively for demonstrations and last minute checks before a build goes into production. Any build that makes it this far should be rock solid. The staging environment may, but does not need, to be connected to production resources such as databases and file systems.
QA Delivery Options
Delivering code from the build server to the integration environment can be done using one of the models described in the baseline scenario. Moving from integration to QA needs a bit more thought as it involves multiple teams. Here are some of the patterns that I’ve seen used successfully.
In the developer-initiated model the developers choose when to spot-check and promote builds to QA. This model is used when QA is, for lack of a better term, subservient to development. While at first glance this may sound great for the developers, it usually implies that there is a problem elsewhere. For example, if there is an ongoing quality issue the QA staff may be spending a lot of time simply waiting for bugs that are blocking their work to be corrected.
In extreme cases it may be necessary to setup an automate promotion to the QA environment on a timer.
This is a more typical model that will work for most teams. The developers are still involved, they need to spot-check their work in the integration environment and certify builds as good or bad.
Under this model QA pulls the most recent “known good” build whenever they are ready to test a new feature. This is usually done by the QA manager, as he generally has the best visibility into the needs of the QA staff. That said, some QA teams allow any member to pull down new builds.
Test Runner Initiated
For companies that are truly dedicated to automated testing this is the goal. Once builds hit the integration server the entire suite of automated tests are run. If they all pass the build is automatically promoted to QA. As with other automatic deliveries this can be done on a per-check-in or timer basis.
One should not underestimate the investment this model requires. Not only must there be a comprehensive test suite available, all tests must be passable. The build server isn’t going to be able to distinguish between a failing test that indicates newly broken functionality and a failing test that indicates something that should be addressed at some distant point in the future.
One work-around for this is to break up the tests into a must-pass project and a provisional project. Tests start in the provisional project, especially when they are used for TDD-style programming. Once the tests are verified to be correct and useful, and the code can pass it, the test is promoted to the must-pass project. The build server would not even run the provisional tests, but it will honor the results of the must-pass tests.
Staging/Production Delivery Options
Under the continuous delivery philosophy QA has only two options for handling a given build that it receives. The build can either be failed or be promoted to staging. QA does not sit on a working build while waiting for other features to be completed.
This does raise a question about what it means for a build to be “working”. A working build is any build that can be safely put into production. If it has incomplete features, but those features work as-is, then the build moves forward. Builds cannot be held back unless there is a specific failing that will interfere with the use of the application in a production setting.
Once in staging the build should be promoted to production during the next release cycle. While continuous deployment to production is not always desirable, weekly or even daily deployments are not unheard of. The essential point is that once a build is proven to be good it needs to swiftly move into production so that the staff can focus on the next set of features. Builds that linger for weeks or even months in staging will cause no end of problems.
The reason one has separate QA and staging environments is to facilitate workflow. As soon as a build is promoted to staging the next build is pulled up from integration. Thus staging always has a stable environment for stakeholders and other third parties to examine while QA still has an environment to work with. Were the roles combined, QA would be blocked during the times when their environment is frozen.
Once you start having multiple environments the configuration files can become a serious problem. For example, a staging server must be property configured so it won’t do things like send out test emails to all of your customers or place orders through a live payment gateway. One apprentice developer I worked with tried to purchase several million dollars in bonds via a misconfigured test server. (Fortunately in this case the price per bond was higher in production and the order wasn’t filled.)
This situation arose because the production configuration settings were stored in source control and deployed along with the application. This was done to avoid a similar problem where non-production settings were in source control and deployed to production machines.
A surprisingly easy way to avoid the above problems is to simply not allow the build agent’s network account to have write access to the configuration files. Then if someone were to accidently check-in a configuration file, the deployment would fail and the error can be corrected.
Unfortunately this can lead its own problems. Whenever a new configuration value is needed under this scheme it has to be manually applied. Failure to do so will result in a broken environment a risky proposition when updates can occur at any time.
Separation of Concerns and Configuration Files
While the term “separation of concerns” is often used to justify otherwise absurd design decisions, it does have a useful role when one seriously thinks about who is concerned with what. For example, people in the production support role are not concerned with what logging framework the developers choose to inject. They are, however, concerned about database connection strings and the email address errors alerts are sent to.
These are the types of configuration files I recommend:
Environmental Settings: Values that are environment-specific and must be setup separately for each server. These tend to be changed only when major events call for it, such a bringing on a new database or file server online.
Code as Configuration: These are things such as an XML file that drives a dependency injection framework. While it looks like configuration, it should never be touched by anyone besides a developer. These files should be stored in source control and be marked as read-only on the servers.
Fine Tuning: These are settings that are not environment specific, but may need to be touched by production support. This would include items such as the batch sizes for bulk uploads or the timeout for web page requests
Of the three types of configuration, the fine tuning one takes the most effort to get right. Ideally it would have defaults specified in a file under source control and overrides added to machine-specific, non-versioned files.
Configuration and Training
A useful technique to avoid adding unnecessary configuration settings is to simply require documentation and training for each setting. If you cannot justify spending the time to teach production support when and why to adjust some value, then they are not going to be capable of doing so anyways and thus the value shouldn’t be configurable.
Scenario 3: Multiple Teams with Service Oriented Architectures
When working with Service Oriented Architectures it is common to employ multiple teams. For example, it is quite common for one team to build the databases and services while a second team handles the user interfaces. In some cases the two teams may be so closely related members are often traded between them. In others, the teams may be from different companies on opposite sides of the world. No matter how they are divided, the basic pattern is the same.
As in scenario 2, the services team needs a place to test out builds where they won’t negatively impact anyone outside of their own team. Meanwhile the UI developers need a server that can be trusted to be stable at all times otherwise they will be unable to do their own work. Thus it is essential that a “development environment” be created in addition to the other environments we find in single-team scenarios.
(Click on the image to enlarge it)
Beyond the UI Integration phase we see the same sequence as in scenario 2. The delivery options are the same, with the caveat that only the UI team is involved in this process. The opinion of the UI team as to the quality of the build must take precedence over that of the services team.
Development Delivery Options
When and how to deliver code to the development environment can be the source of much tension. When a bad build hits the QA environment the QA team can simply fail it and turn to other tasks such as preparing tests for new features or improving their regression suite. If a bad build hits the development environment then the entire UI team can find themselves unable to work. So while any of the delivery models seen for QA delivery can work, the option based on automated tests is by far the most successful model.
Side Bar: Who writes the integration tests?
When dealing with a separate service layer, it is essential that both the producing and consuming teams write automated tests. The team producing the service layer is the most knowledgeable about the internal workings of said layer and thus can write the kinds of tests others wouldn’t even know is necessary.
This, however, doesn’t excuse teams consuming the service layer from writing tests. Their tests cover not only scenarios the service writers didn’t think about, but also test their understanding of the service layer. For example, the UI developers may assume a given call will never return a null or negative value. By testing all of the parameter combinations the UI is actually using, they can rest assured that their assumption is correct.
If your company is blessed with QA Engineers, as opposed to mere QA Analysts, they too may find themselves writing automated tests against the services layer. This is often done in conjunction with automated UI testing, especially when the results of the action are not necessarily verifiable via the user interface.
Scenario 4: Multiple Teams with Parallel Feature Development
This is where things get really tricky. So far every scenario discussed assumed there was a single development branch. Once you start dealing with multiple development teams working in parallel on the same code base you have to decide when and how to move features from team branches into the primary development stream. Here are the two models that I’ve seen work successfully.
Feature Push Model
In the feature push model each team can push their changes into the main branch whenever they are ready. This model has the advantage of allowing the teams to be self-sufficient.
The most common strategy under this model is to merge and test in the local branch. Once the tests have been passed the change set is pushed to the main branch.
The biggest risk in this model is the lack of atomic merges. It is quite possible that one team will change a function’s name or signature while another team is adding a new file using same-said function. If both changes are checked in at the same time then the build will fail even though the source control system didn’t report any conflicts.
This option requires a source control system that supports locking. When it is time to push a new build the main branch is locked. The new features are merged with the main branch locally, smoke tested, and then pushed up to the main branch. While merge issues can be resolved under the lock, any tests failure must result in the lock being immediately released.
Feature Pull Model
In the feature pull model teams are never allowed to publish their changes. Instead someone on the change control team is responsible for pulling features into the main branch. This allows the QA team to only receive the changes that they are ready to test.
The feature pull model all but requires an advanced source control system that supports integrated work item tracking. Merely tagging change sets with task numbers isn’t enough; one has to be able to say “merge feature X into branch Y” and have the source control identify every change set that is needed. This can be done manually, but it is exceedingly time consuming and error prone.
For simple merges the change control engineer can generally handle it on his own. For more complex merges, especially when the teams haven’t been rebasing their branches on a regular basis, the team that developed the feature will need to assist.
Regardless of how the features make their way into the main stream the structure is going to be the same. Each team gets its own integration environment that they are continuously publishing to just as if we were in the baseline scenario. These team-specific integration environments feed into the common integration environment which then flows onward as normal.
(Click on the image to enlarge it)
What about hot fixes?
Throughout all of these scenarios we have never mentioned the concept of a hotfix. This was intentional; there is no such thing as a hot fix when abiding by the philosophy of continuous delivery. Once changes hit the integration environment they need to move quickly into production. Under this theory you don’t need hot fixes, you just have normal bug fixes that happen to take priority over feature development.
Unfortunately the real world doesn’t always measure up to the promises made by the theories. From time to time features will be stuck in QA for longer than desirable due to either quality issues or merely their size. Likewise a production deployment may be delayed due to business needs such as contractual obligations or a well-publicized upgrade scheduled for a specific date. When such events occur a hot fix may become necessary. At this point the best solution is to throw out the process and just do it. Don’t allow formalities to place an undue burden on your company and customers. Once the dust settles and the crisis has been overcome you can start looking into why it happened.
The goal of continuous delivery is not to make hot fixes easier to handle. The goal is to develop coding and testing standards that eliminate the need for hot fixes. Every time the process fails you have an opportunity to learn how to improve your coding standards and testing practices so that major bugs don’t happen. Likewise, it gives you reason to examine the flaws in your scheduling policies that resulted in the pipeline stall. If you don’t focus on both aspects you will never get to the point where all bug fixes can go through the same regimented procedure.
In short, continuous improvement is an essential component of any form of continuous delivery.
About the Author
Jonathan Allen has been writing news report for InfoQ since 2006 and is currently the lead editor for the .NET queue. If you are interested in writing news or educational articles for InfoQ please contact him at firstname.lastname@example.org.
One tool end to end is stronger than several
That's true, but it's also a bad thing. One of the key benefits of such a system should be to provide an end-to-end record for a release to production. It was built here, in this way, from this source code, deployed there, tested in that way, and finally promoted to production. We should know who ran the deployment, what the source code changes are, and the test results. Now... if you don't really care about any of that, then perhaps the tooling doesn't matter much and we're back to the days of a couple scripts, cron and a simple gui. But there are reasons vendors have spent years developing these systems and providing visibility and audit on top of simple automation is one of them.
For configuration, I'm a fan of templated configuration files in source control. On deployment, the CD server should insert the environment specific configuration in. Who manages that is now a matter of security in the CD system rather than user-account permissions on the deployment machine - for better and worse (most better).
A bit too theoretical
Re: A bit too theoretical
Configuration practice - anti-pattern?
Your article appears to suggest that I leave my configuration out of my source control system. One of the primary goals of continuous delivery as a collective set of practices is to reduce the risk of release by reducing unknown change to the variants within the system. If I do as you suggest, and not check my configuration files into a version control system I'm opening myself up to possible change to these files without a way to tell.
Could you clarify what you're suggesting in terms of configuration management?
Re: A bit too theoretical
The thrust of our book is intended to be the opposite of theoretical. We describe the automation of your development process as a "deployment pipeline". Each stage of the pipeline builds on the success of the previous stages, we recommend never returning to an earlier stage, or skipping any stage for any change. In this way this is the antithesis of "lets do random tasks and put it all to production in chaotic order" this is rather "when we deploy into production let's do PRECISELY the same things that we have already done successfully, several times before".
Every activity is automated and that automation is the ONLY mechanism used to deploy release-candidates to test and production environments. Every last bit and byte of the production system is version controlled and so the whole environment is as deterministic as we can make it.
My own organization, and many others, have been employing this practice successfully for some years. Some of the biggest and most complex applications are deployed this way with NO MANUAL INTERVENTION in the release process other than selecting a candidate and pressing the "GO" button. I can assure you that this is not a thought experiment or a practice that only scales to "toy projects" but rather, in my experience at least, it is the most effective way to release software and avoid mistakes that we have come up with so far.