Beyond Continuous Integration: Continuous Deployment

The sooner that a feature gets into production, the sooner it starts adding value. The quicker a system can change in response to user feedback, the easier it is to keep the users happy. Timothy Fitz and Joe Ludwig have recently published articles that describe practical implementations of continuous deployment, a process that reduces the release cycle from weeks to minutes.

Timothy's first article examined the impact that continuous deployment could have on the cost of fixing bugs. The more time between when an error gets introduced into a system, and when it is found, the more difficult and expensive the error will be to fix. If the engineer sees the mistake right after they type it, the cost of the bug is essentially zero. If the compiler catches the bug, the cost, in terms of developer time, is likely measured in minutes. If the bug gets deployed into production, but goes unnoticed for some time, the cost to find and fix the error can be astounding. The industry saw a dramatic example of this with Y2K. Timothy's position is that it is better to fail fast, so that the impact and cost of bugs can be minimized.

The comments posted by readers indicated significant skepticism about the practicality of continuous deployment. Erik A. Brandstadmoen put it bluntly: "In real life, I don’t think [your] approach is good enough." A commenter on ycombinator said: "ah... no. Maybe this is just viable for a single developer, as a substitution for continuous integration. But with multiple developers checking in, on a complex system your site will be down. A lot."

In response to the skeptics, Timothy wrote about how IMVU continuously deploys their system. The process starts with continuous integration to quickly build and test new changes. One of the keys is extensive, and extremely reliable, automated tests. They employ a farm of test machines to keep the runtime for the entire test suite under 10 minutes. Once all of the tests have passed, deployment begins.

The code is rsync’d out to the hundreds of machines in our cluster. Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. A symlink is switched on a small subset of the machines throwing the code live to its first few customers. A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back. If not, then it gets pushed to 100% of the cluster and monitored in the same way for another five minutes. The code is now live and fully pushed.

With 60 employees, 30 million registered users and over a million dollars a month in revenue, what IMVU has created is certainly not trivial. Based on an examination by Michael Bolton and James Bach, the system is also not perfect. Elisabeth Hendrickson put this in context by pointing out that perfection is likely not the goal of the system.

Joe Ludwig, a former architect of Pirates of the Burning Sea, wrote two articles examining what it would really take to do continuous deployment in an environment with heavyweight client code. He starts with a description of the seven and half hour deploy process for 'Pirates' and outlines what it would take to reduce that to one hour. In his second article, he describes some of the important technical changes that would be required to make the one-hour deploy a reality.

What is your experience with continuous deployment? What have to change about the systems you work with, in order to make them continuously deployable? Leave a comment and share.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Agile topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter