The sooner that a feature gets into production, the sooner it starts adding value. The quicker a system can change in response to user feedback, the easier it is to keep the users happy. Timothy Fitz and Joe Ludwig have recently published articles that describe practical implementations of continuous deployment, a process that reduces the release cycle from weeks to minutes.
Timothy's first article examined the impact that continuous deployment could have on the cost of fixing bugs. The more time between when an error gets introduced into a system, and when it is found, the more difficult and expensive the error will be to fix. If the engineer sees the mistake right after they type it, the cost of the bug is essentially zero. If the compiler catches the bug, the cost, in terms of developer time, is likely measured in minutes. If the bug gets deployed into production, but goes unnoticed for some time, the cost to find and fix the error can be astounding. The industry saw a dramatic example of this with Y2K. Timothy's position is that it is better to fail fast, so that the impact and cost of bugs can be minimized.
The comments posted by readers indicated significant skepticism about the practicality of continuous deployment. Erik A. Brandstadmoen put it bluntly: "In real life, I don’t think [your] approach is good enough." A commenter on ycombinator said: "ah... no. Maybe this is just viable for a single developer, as a substitution for continuous integration. But with multiple developers checking in, on a complex system your site will be down. A lot."
In response to the skeptics, Timothy wrote about how IMVU continuously deploys their system. The process starts with continuous integration to quickly build and test new changes. One of the keys is extensive, and extremely reliable, automated tests. They employ a farm of test machines to keep the runtime for the entire test suite under 10 minutes. Once all of the tests have passed, deployment begins.
The code is rsync’d out to the hundreds of machines in our cluster. Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. A symlink is switched on a small subset of the machines throwing the code live to its first few customers. A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back. If not, then it gets pushed to 100% of the cluster and monitored in the same way for another five minutes. The code is now live and fully pushed.
With 60 employees, 30 million registered users and over a million dollars a month in revenue, what IMVU has created is certainly not trivial. Based on an examination by Michael Bolton and James Bach, the system is also not perfect. Elisabeth Hendrickson put this in context by pointing out that perfection is likely not the goal of the system.
Joe Ludwig, a former architect of Pirates of the Burning Sea, wrote two articles examining what it would really take to do continuous deployment in an environment with heavyweight client code. He starts with a description of the seven and half hour deploy process for 'Pirates' and outlines what it would take to reduce that to one hour. In his second article, he describes some of the important technical changes that would be required to make the one-hour deploy a reality.
What is your experience with continuous deployment? What have to change about the systems you work with, in order to make them continuously deployable? Leave a comment and share.
Community comments
I dreamed of a similar build system, once...
by Raffaele Guidi,
Seen this done
by Chris Johnston,
This is the only way
by Evan Worley,
Re: This is the only way
by tan bronson,
The example given in the first article
by Ramazan YILDIRIM,
Continuous Deployment Blog
by Tim Bassett,
I dreamed of a similar build system, once...
by Raffaele Guidi,
Your message is awaiting moderation. Thank you for participating in the discussion.
...even though on a quite smaller scale and brought it to the first step - continuos deployment on test machines. But I knew (and now that I read this I'm sure) it would work. An excellent case study - and great stuff!
Seen this done
by Chris Johnston,
Your message is awaiting moderation. Thank you for participating in the discussion.
The last two projects that I have worked on, we had continuous deployment of a sorts to our test boxes. The first project we were able to hook up a build in CruiseControl.net that allowed anyone working on the project to do a deploy with a single click by issuing a build.
The second project, they had built a Rails application that allowed anyone to deploy any part of the application to any of our 9 test boxes. The Rails app would then give them the progress of the deployment and whether it was successful or not.
However, neither project could be automatically deployed to production, but there was nothing technical that stopped this from happening either.
With a combination of DBDeploy, Capistrano, and other utilities that allow you to manage all aspects of a project, there is nothing stopping any shop from doing either continuous or automatic deployments. And once you have automatic deployments, how far behind is continuous deployment?
This is the only way
by Evan Worley,
Your message is awaiting moderation. Thank you for participating in the discussion.
If systems are designed with this requirement in mind, continuous deployment can be easily solved. The Maven+Hudson combo goes a long way in providing these capabilities. One sticky are we've experienced is service level integration, for the cases when a deployment consists of an entire stack, including App, backend services (with inter-dependencies), etc. We ended up doing some clunky service level integration steps, but they didn't feel very good :P. As the author pointed out for IMVU, it's critical that the deployment can be rolled out slowly, and undone if it fails. I see too often teams strive for 100% perfect deployments. Things will go wrong, there will be bugs, the continuous test suite won't catch everything. In my opinion, the ideal solution strikes a balance between time to deliver features, and time between bug injection and bug detection.
-Evan
Re: This is the only way
by tan bronson,
Your message is awaiting moderation. Thank you for participating in the discussion.
We've also had great luck with CruiseControl.net, and a home grown equivalent to DBDeploy.
(every data object script was responsible for upgrading itself, and included a list of objects that needed to be managed before it can update itself)
Now we're in a hibernate environment, and we've been reduced to manually diffing schema, and writing scripts from that. (This is then delegated to the DBA team)
Are there tools to help automate the database changes hibernate driven environements?
The example given in the first article
by Ramazan YILDIRIM,
Your message is awaiting moderation. Thank you for participating in the discussion.
A typo might have some substantial effect on a software system. Its side effects might not be undone or rolled back easily. In rare cases it may even ruin a system. Also a typo might as well effect other "unrelated" parts of a system unexpectedly, not the expected parts. Especially in large systems this might be uneasy to discover.. And if you are doing continuous deployment on a large scale project, then this might mean "many" unrelated fixes deployed at once, and you should not forget that many of the problems will not be discovered instantly after deployment. Thus the isolated deployment principle won't work.
So the example given in the example isn't true even if you do extensive automated testing. I don't object all the idea of continuous deployment, but the only example in the article is misleading.
Continuous Deployment Blog
by Tim Bassett,
Your message is awaiting moderation. Thank you for participating in the discussion.
If you interested in more continuous deployment stories, challenges and successes, please take a look at this blog: ciadvantage.com/cs/blogs/tim_bassett/default.aspx