Deployment is the Goal
The lean movement is growing rapidly. It's hard to dismiss this meme when one of the best carmakers in the world got there by being lean. We're getting better at applying that to the business of writing software, too. But what about the way we build then test the software? We're making a great meal in the kitchen, but we're failing to deliver it to some hungry diners. When they finally get their meal after about an hour, they won't take kindly to finding anything wrong with it. We need to get that pizza to the diners before the mozzarella congeals back into a hard lump.
Similarly when we write software, we're very good at getting requirements and turning them into code. To turn that beautiful code into working software we need to deploy and test it. Often, we fail to emphasize the latter as well as the former. Do you have a backlog of "code complete" software waiting to be deployed, tested, signed-off and made live? Do you work in 1 or 2 week iterations but only release to production once a year? Is it a huge ceremony when you do finally deploy? Then you could be wallowing in your own waste. Perhaps your lean or agile transformation has further to go than you think.
If you need a cinematic cliché to establish that your film is set in London, show a quick few seconds of the neon signs and the traffic at Piccadilly Circus.
You'll probably want to go there and have a look for yourself. But where to eat? If you like sushi, leave Piccadilly Circus. Walk south-west down Haymarket towards St James's Park. Yo! Sushi is a couple of blocks down on the right hand side. I recommend a seat at the bar rather than a booth so that you can see the chefs at work. A large, hot sake won't hurt, either.
At Yo! (okay, and at many other fine sushi restaurants around the world) the chefs have a direct connection between the kitchen and the diners: a conveyor belt. The belt runs past every table and every spot on the bar. You can speak to the chef if you want to. To eat, just pick off a dish off of the conveyor. You can place an order if you want to, but you can eat an entire meal via the conveyor. How does this relate to software and deployment?
The chefs in Yo! have a huge advantage over us software people; they can see the entire lifecycle of their product. If everybody is eating California Roll, they can see that they'll run out soon and make more. If nobody is having desserts, they know to quit making them. They can deliver a flow of food continuously (I've waited longer for a burger) and maintain a low inventory of freshly made sushi.
We don't work that way when we develop software; the developers (chefs) hand the code to someone to deploy it (kitchen hand), someone to test it (food hygiene inspector), and then we get the BA (waiter) to run it past the customer (diner) before getting the ops team (that other waiter who you only ever see once at the restaurant, who triumphantly brings you the food, as if they grew it themselves) to make the code live.
What do software and food have in common? You need to serve them both quickly, or they spoil. How often does your team deploy the software that you produce? How often do you deploy to real test environments that look like the production system?
Some code never sees the light of day. Some code waits a very long time to do its job. This waiting is one of the classic manifestations of waste; there's business value trapped inside your code, which cannot be realized until you deploy it to production. But first you need to get it to a test environment. Otherwise it's just rotting in the kitchen.
Why do we delay this vital process?
- Too many cooks: if you have to tag your code in a Version Control System and then hand it over to another team to compile, then yet another team to deploy, that's always going to be hard.
- Missing ingredients: you just wrote your new application in the wonderful new .NET 3.5? Oops, we don't have a production machine that runs that version of the framework (I saw this myself - it took about 6 months to get that application out).
- No food processor: deploying by hand can be error-prone and time -consuming.
- Poor training: we don't train our developers to do anything but write code. It's as if the other parts of the life-cycle don't exist.
Often, however I think we simply fall into sloppy kitchen habits: we don't see why you should deploy the code until there's business value to deploy. By the time that you're ready, will you have time to get the deployment story right? Will deployment concerns mean that your code needs to change? There're many reasons why you might not get the code to production, but no reason to delay deployment to test systems.
At some point in the development process you have to deploy your code in a realistic way. We deploy code for several reasons:
- to know that we possess a working deployment process (do you have enough waiters? Is the conveyor belt working?),
- to functionally test it (does it taste good on it's own?),
- to ensure that the code works with the rest of the ecosystem that it will live in ( will it go with the other dishes?)
The first and last points seem to be lost on some project managers: The way that you deploy code can and does change the way that you develop it. I worked on a project where we could easily demonstrate our code running on the development operating system with all the external services stubbed out. We never actually got the code to pass its tests on a more realistic operating system and the production middleware.
I have worked on many projects where we couldn't deploy the software easily to production. Code that is built against the wrong version of the production application server; code that won't meet security policy ("what do you mean, I can't connect to the Internet from there?"), and stereotypically, code that only appears to run on one developer's computer.
Delaying the deployment of code that you write allows you to present apparent progress ("we're delivering code every iteration") at the cost of long-term success. You're essentially betting that you will be able to deploy the code at the end of the project. Addressing deployment early isn't a bet however: it's an investment. You're making dessert before you cook the main, so you hedge against the risk of burning the crème brûlée ten minutes before you want to serve it. Agile methodologies are intended to help us manage risk. Sometimes we seem to court risk by developing our software in an isolated environment. If you want to feed a lot of people you need to cook in a restaurant kitchen, not a portable grill.
In the TV shows 'Hell's Kitchen';, or 'Ramsey's Kitchen Nightmares', you get to enjoy the spectacle of a perfectionist chef yelling at a hapless amateur cook for ruining the halibut. But can we throw code away like a spoiled plate of food? What we need is testing. Lots of it, and early as you can. You'll want to make sure that you can deploy the code to a proper environment first, so you know that you're testing the right thing. Then go nuts: smoke tests, any acceptance test suite you have, and of course regression and exploratory testing. My observations on large software projects suggest that while we don't often neglect to have testers, we don't treat the QA process with the appropriate level of reverence that it deserves.
Testers are generally outnumbered by developers during IT projects. It's true that one tester should be able to verify the output of several developers. But how do you know how many you need? And do you make good use of the ones that you have? It's probably the case in your organization that code spends more time waiting for someone to test it than being tested. Do you try and move the some of the burden of testing back onto developers? Reduce the output of the developers to a level that you can sustainably test? Or keep plugging away in the hope that it somehow works out?
Theory of Constraints
The title of this section is a reference to the Theory of Constraints . TOC was first published by Eliahu M. Goldratt in 1984 and posits that we must consider the goal of an organization (or project) and
i) we must align all our activities toward achieving that goal.
ii) At any time, there will be one activity that will present a bottleneck in the rate at which your organization can reach its goal. In this case, there's no point in making the rest of your systems deliver any more, unless you can widen or sidestep the bottleneck.
Prior to TOC, manufacturers would clog up their factories and warehouses with excess part inventory to bring the unit cost of parts (and therefore finished goods) down. This harmed their delivery of the finished goods; manufacturers would have to sift through warehouses to find the parts they needed to finish a product, or wait until a huge run of parts was machined before they could manufacture the parts that they needed for the most urgent orders. In the business of software, we run up different kinds of inventory in front of own bottlenecks. Deployment and testing are critical steps toward reaching your goal of software running in production; and understanding your bottlenecks is critical to getting the best flow of working software though your organization.
The first point is pretty obvious; clearly you need to deploy the code in order to use it. But leaving it until the last minute would be analogous to building up an enormous inventory and then attempting to burn the midnight oil to get it through the final stage. But how many units can you get through that final stage? If you have a backlog of changes stuck in front of a bottleneck in your system, no amount of expediting is going to help you deliver it all. In fact, expediting will probably make things worse.
The second point refers to your team and their workload: just where is the work piling up? In my experience you don't often have a shortage of developers: you'll have a shortage of stories to work on, or a shortage of QAs to test the code when you think you're done with a story. Once you understand what the tightest bottleneck in your project is, you can do something to immediately improve the throughput.
That may be investing in more continuous integration hardware, diverting developers to work on systems integration tasks, or making an investment in developer testing.
Why do we put so much effort into writing and building code, when we don't yet know if it will deploy? If you understand at all times that your eventual goal is to deploy working software to production, and bear in mind that you will get yelled at, Ramsey style if you can't deploy any software to production, you'll have the right priorities. When you can examine all the efforts of the project and see that they contribute toward the goal, you're able to take corrective action.
On a recent project I was able to steal a chapter from Goldratt's book 'The Goal' and improve a bottleneck in the Continuous Integration service. Previously, the large number of developers working on the present iteration's stories would swamp our complex continuous integration process with changes. The developers trying to work on the production bug fixes would have to wait a long time to find out if their tiny bug fix checkins would pass though the CI service.I made a dedicated Continuous Integration service for the 50 developers working on the main branch of development. The old CI service was used for the production support branches. We widened the bottleneck by making these changes, and different workstreams weren't contending for resources.
The net effect was that when this work was complete our response times for production bug fixes or UAT fixes improved remarkably. A nice side-effect was the people working on the next version delivered code faster too. Any improvement in the feedback loop matters when you have 50 developers working on the same codebase.
In the past, developers thought that they were done if the code worked on their computers. I like to think that we have moved on: more recently developers are happy with the notion that if their code passes a CI build, they are done. Just as we arrive there, we need to change their goal to be writing code that deploys. Here's what to do:
- Only count a story as complete if you can deploy the code to a realistic environment. Your definition of 'done' needs to include 'deployed'.
- You need to test on grown-up systems. No amount of lightweight components will help you prove that your code will run on enterprise systems. If you have automated tests, great! Can you run them against an integration test environment? Or do they work just fine on your computer?
- Your production deployment process should be rehearsed thousands of times before you actually perform it for real. Automate the deployment process as much as you can. That's not enough though; I suggest exercise your deployment scripts as part of a Continuous Integration build.
Once you know that your code deploys, you're one step closer to being done. You're only really done when your code is running in production - and there are no complaints from the diners.
About the Author
Julian Simpson helps bridge the gap between software development and IT operations. He's spent the last 5 years helping people with Continuous Integration, deployment, build tools and version control systems, on the Java, Ruby and .NET platforms. He writes about all this at his blog The Build Doctor, and on Twitter.
He has presented at Agile 2007 and XP Day 2007, and QCon London 2009. Julian lives with his fiance and their children in Surrey, UK. In his spare time he likes to cycle, garden and play poker. Though not at the same time.
Delivering Performance Under Schedule and Resource Pressure: Lessons Learned at Google and Microsoft
Ivan Filho Mar 06, 2014