Untangling the Enterprise With Continuous Delivery
Buying the House
It always starts when you buy a house. You make rational decisions around neighborhood, price, nearby shopping, closeness to friends and family, and commutes. You make irrational decisions around the feel of the space, whether your favorite vacation picture will look good on the living room, and whether the guy walking his dog smiled or frowned at you. Gut response usually wins out. And that's OK. Buying a house is not primarily about making money, it's primarily about quality of life. The economics have to be work but at the end of the day it's about values and what really matters most in your life.
I bought a house recently. Everything is great. Quiet, near the ocean, pastoral small farms and farmer's markets, close to an airport with regular flights, and great neighbors and friends nearby. And the house is in good shape for once. My last two places were real "fixer uppers" which is so much fun when you're raising kids on a budget. For once, it's perfect.
Well. Almost perfect. You see, I would like a fair sized garden. After living here a while I've realized the best spot would be a meadow down below the house. It's sheltered by trees all around and gets lots of sun. It would be easy to service with water or a cistern. The ground is fairly level and it just has a nice warm feel to it which is aesthetically pleasing.
It's also the home to the thorniest blackberry patch that you can imagine. Thick thorny tangled stalks of blackberries have grown into an impenetrable jungle taller than me and bigger than the footprint of our entire house. Don't get me wrong, I like blackberries well enough but this evil thatch has taken over the best part of our land. I can't begin on my long term goal of a world class garden until I've dealt with this twisted, prickly mess.
Designing the Garden
Congratulations on owning a pretty good enterprise sized IT organization. OK, it's not a brand new organization but it's in good shape, right? You've made intelligent buy vs build decisions. You're developing service boundaries to decouple systems. Your uptime for critical systems is well to the right of the decimal point. Your budgeting decisions are reasonable. For the most part new projects usually finish well and upgrades generally work. Yup, all is bright and sunny in your IT organization. Sure, there is some murmurs about technical debt and a couple of aging systems, but nothing you can't fix in the next couple of years.
And then your CEO decides to build a garden. You sketch out the garden in broad terms. Marketing wants new loyalty features. Check. We've got a CRM system. Customer relations will need better information on certain segments of the existing customer base. Check. We've got reporting and data warehousing. We're going to expand our product line. Check. We've got inventory and warehouse management systems. We need to significantly upgrade the website so we can sell products online, not just through our retail stores.
Suddenly you stop saying "check". Customers have never ordered anything through your website before. Certainly your enterprise has all the pieces in place since you fulfill orders every day, process payments, back order items, and track customers for followup. You start talking with various teams throughout the IT organization about how feasible this is and suddenly you're not so confident.
Your fulfillment system runs overnight jobs to allocate inventory to stores. It was never designed for real time fulfillment. Couldn't you just use the same technology to build a virtual store which could service the online site if we fix the fulfillment problem? Well technically, yes but we'd need to upgrade the inventory system too. But the real problem may be matching the customers to the orders since the database replication job runs when the store is closed. We have database replication jobs? I thought we used service level integrations between systems? Not for the CRM system, it's been around too long. We'll need to upgrade the CRM and it might break reporting. Why would it break reporting? Well, the way reporting was designed it can't handle new channels like online. We've been meaning to fix that for as while but other projects had priorities. And has anyone thought about what this would do to financial reconciliation? And currently we couldn't refund online orders without a manual approval. Shouldn't that be easy to fix? Well, you would think so, but no.
The more you dig, the more problems you uncover. Quite soon you realize that you have a thorny, tangled blackberry patch.
Why Does this Happen
It's rare to make large wholesale changes in a well established enterprise. The individual systems are increasingly integrated over a series of years, or decades, which means that information flows through an organization in well established patterns. For example, at a telco a customer's package of services will have to be provisioned to physical hardware and networks. Switches are used in the customer's neighborhood. Routers are attached and configured for internet and phone services. The phone service is configured with features like a secondary line with a distinctive ring. When these services are provisioned then that information is passed to a billing system to manage regular payments for these services.
In this example the billing system doesn't have information about the router's model and network configuration. It doesn't need it to bill. The provisioning system doesn't care about the monthly price for secondary lines on a phone service or the payment history of the customer. It doesn't need it to configure the services. When these systems were initially integrated the information and business was probably simple. The telco only sold internet packages with two basic configurations - high speed and dial up. Over time internet packages were added as internet technologies improved. New types of customers were added like small businesses who had different billing needs. New services were added like VOIP phones on the same infrastructure. New billing features are added like bundling of services. The complexity increased over time.
This type of complexity is evolutionary. This initial simple integration became more and more complex. The business rules within each system became more elaborate as they adapted to the business growth. The integration of information between these systems became more detailed. Data became richer and with more internal complexity. And even if the combined complexity grows at a linear rate, it becomes more and more difficult to make changes in each system without affecting the other. This is common and routinely happens in successful organizations.
Evolving a Solution
It's rare to make wholesales changes within an IT organization since it's risky, expensive, and prone to failure. If your combined systems have evolved into a tangled web then you need to lay the groundwork to evolve to a solution. You have several options for improvement. A one time step change of systems or practices can yield good, if mixed, results. For example, if you invest capital in a program to improve design or development skills, simplify and streamline infrastructure, retire redundant systems, or even replace parts of you enterprise then you should see some returns on this investment. Once the burst of activity is finished organizations generally fall back into a new normal state which is not significantly different than the old normal.
Continuous delivery (CD) and lean practices are a better route for longer term, sustainable improvement of the enterprise. Most people think of CD practices when they're starting the development a single system but I think the real value comes from untangling your enterprise. Systems that are difficult to upgrade or change need CD practices. Systems that are difficult to integrate need lean thinking. Moving an enterprise to leaner practices means getting increased value out of the entire integrated systems portfolio. Isolated changes don't contribute significantly to larger value chain. This applies not only to newer greenfield systems where you have a high degree of control over the delivery practices, but especially to larger and older systems that are developed in-house or by a vendor.
Where to Begin?
Start by looking in detail at the deployment and change practices around your individual key systems. Most likely there is a high degree of manual processes and controls for making changes. Apply value stream mapping or even standard flow charting techniques to the most basic building block of systems management - the ability to successfully deploy a system into a new environment. Although many systems are upgraded in place this rarely touches all parts of the system footprint which includes deployment of all artifacts to operate the system, applying an environment specific configuration, setting up database schemas and base data, configuring endpoints for integrations, and ensuring that there is monitoring and logging. Deploying a system fresh is the only sure way to have a known state and to understand the characteristics of changing any part of the whole system. Think of it as "green fielding" an existing system.
Once you think that you understand the process then actually measure deployment through to a fully functional state within the new environment. Gather time and people measures on all the steps that you identified and see if anything is missing. If the systems have little or not automation for deployments then keep it low tech at this point using timers or stopwatches for each step. Identify the roles and experienced people needed for each step in the deployment process. For example, do you need a DBA with deep proprietary knowledge or can any support engineer setup the database? Were there wait states between steps when nothing was happening? Make sure those are part of the measures. Was it difficult to verify that system was configured properly? Is everything actually operating correctly such as integration points and logging? Was it difficult to setup a base set of data so the system could run correctly? Did you need to copy or replicate any component, configuration, or data from an existing environment to be successful?
You may need to repeat this process more than once until you're actually successful. Ask qualitative questions to the staff who are actually doing the actual work. If certain individuals are replaced by other people would you have a much poorer outcome? Are vendors necessary for any or all of the process? Is there heavy resistance to the idea of testing your deployment process? These represent constraints in your future ability to make improvements to individual systems within your organization.
Next, take a look at the overall process of making a small change to that system, and then moving that change to production. Again use value stream mapping to get an overall understanding of the process. You should now have a unit of measure which is the ability to deploy the system to a new environment - use this to identify likely problems in the overall process. Again, measure the steps and wait states in the process. Do you upgrade a system rather than do a clean deploy? Does this take more or less time? Does an upgrade and a clean deploy yield the same results? Does it take the same time to verify the successful deployment at each stage? Does the process follow a path and stages that is similar to other systems in your enterprise? Is it difficult to verify the versions of code, configurations, and schemas before and after deployment? Is the deployment process consistent in each environment including production? Are code management practices consistent and easy to verify?
Once you've developed a tangible understanding of the processes for managing the individual system in your enterprise then you can start identifying areas for improvement. Brainstorm ideas with members from delivery, operations, and support teams. If you're not sure who should be involved then think about which teams you would gather in a room to diagnose a serious issue with the system. They all have a stake. Question each part of the process and ask if it is adding value and can be improved. Generate ideas, size them, estimate impact to the overall process, and prioritize them. A very helpful technique in analyzing problems or bottlenecks is root cause analysis using the '5 Whys' which is a technique for finding the real cause of the problem and dealing with it rather than simply identifying the symptoms. This will generate multiple ideas in a structured and lightweight manner which will be useful in brainstorming improvements.
At this point you can start to apply a continuous improvement process to each system using lean startup thinking. At it's simplest think of it as a three step cycle. First, based on your ideas and priorities introduce an improvement into the deployment and delivery process through to production. With the modeling described above you should be able to quantify the change you're going make both in terms of the current current cost and improvement you hope to see. Time is the most common measure, either in people's time or the elapsed time to complete a step. Second, measure the change both in the individual step and the overall process. It is important to measure if the change had side benefits or consequences to the overall process. Third, as a team review the impact of the change to process and learn from the experience. Even if the change didn't have the impact that you expected you've still learned valuable information about the process and system it supports. Again, review your priority list of improvements and make a change to the process of delivering the system into production. Then start the improvement cycle again. The "Build - Measure - Learn" cycle that Eric Ries describes in his Lean Startup book is an extremely powerful and focused approach for improving the management of large corporate systems.
Manual vs Automated Improvements
Automated processes for the development, testing, deployment, and operations of a system are preferable to manual steps or processes. No question about it. Your desired vision and end state should always be automated and easily evidenced pipelines of delivery and management for your systems.
While you're moving towards that ideal end state always consider manual improvements along the way. The aim here is to develop an ability to continually improve your systems management so you can methodically untangle and improve your enterprise. If you can easily improve manual processes then you should do it immediately - especially if it reduces wait states in your process.
So what should you automate? Anything that you do repeatedly in each environment from development to production. Deployments are a good place for initial focus. If this is currently difficult with a particular system then break down the problem into smaller parts. Are deployable artifacts managed and versioned? Is it difficult to deploy the application without manually changing the configuration each time? Can we automate system verification with a health check page or report that includes configuration, versions of code and schemas, and current state of any end points?
Once you develop a predictable and continually improving cadence for deploying your critical systems then you have the basic groundwork for improving your overall enterprise footprint. You can start to plan small, lower risk changes to your architecture and integrations since the cost of moving to production is a known constraint that you can easily incorporate into an improvement plan. In essence, it's a powerful risk management tool. Deployments become a known, and lowering variable - not an impediment.
After the deployment process is stable and improving then you can start to look at other techniques for reducing your cycle times:
- Functional, stable end points in the system for smoke and functional testing
- Improved system verification points for testing and monitoring including data, integration points, and logging
- Versioned and flexible configurations which enable different test strategies and missions
- Versioned and automated deployments for database schemas including rollback
- Standard health check pages for all systems in your footprint with status, versions, and configurations
- Automated build and deploy tooling and pipelines which manage and report the overall process
An additional benefit is that you can create greenfield environments for experimentation and investigation. For example, you might want to improve the flexibility of the system configuration for a particular system but you're concerned about "polluting" your existing test environments. Create a clean install on a server with limited capability where your teams can experiment with ideas and techniques, then throw it away once they are confident in their plan. In the early days of continuous improvement you won't have the luxury of automated build, test, and deploy pipelines so you need a safe haven while you're building the capabilities.
Finally, questions always arise about scaling these types of continuous improvement efforts to a large enterprise with dozens, or even hundreds of systems. Many of the activities I describe sound labour intensive. Much of the work can be done with simple tools like whiteboards and spreadsheets. However, the skills and expertise that your teams develop at evaluating and improving the systems that they support and integrate is the most important outcome. Once a sustainable and measurable practice of improvement is in place for key systems then you have the opportunity to increase organisational alignment through the sharing of knowledge and skills across functional groups and in different system domains. Scaling comes through broader knowledge and an environment where skills and abilities cross functional boundaries. The true measure of progress will be the accelerated learning across your organisation.
About the Author
John Kordyback is a Principal Consultant with ThoughtWorks supporting large projects and programs as a delivery leader and coach. He strongly believes in the the cultural and technical benefits from continuous delivery and lean practices for all organizations. John has worked in insurance, telecommunications, commodity and securities trading, high tech, and the airline industries in roles from development and operations to testing and management. Before his technology career John worked as a researcher and practitioner for people with disabilities.
Internalise the initiative with Developers
I wrote a blog in my previous company where I tried to provide helpful advice to tackling continuous delivery in enterprises. One of the approaches I found to be most successful in making headway without being entangled in Enterprise politics was to internalise the initiative to the dev team. As most enterprises (at least in Australia) still operate with very separate service transition and operations team I found a lot of efforts to improve or establish CD practices where difficult under these organisations structures (lots of people to convince). Building a mock deployment pipeline that mimicked the road to production within 'dev land' provided the flexibility and room for failure and experimentation required.
I guess it depends on how much support you have from upper management for the initiative.
Having said this I remember clearing a bramble bush when I was younger on my parents property and my approach then was to cut a tunnel all the way through the middle of the bush. That was more for fun than any practical purpose but it did have the unexpected effect of making the entire bush die making it much easier to clear. Not sure if that has any relevance :)
Link to my blog about CD in the Enterprise: www.industrieit.com/blog/2012/02/a-practical-gu...
Re: Internalise the initiative with Developers
I absolutely agree about building the capability within the dev teams, regardless of the operational or even management support. Often teams get on a treadmill of producing features to the detriment of all else. Starting the practices locally is like building features where the team is the customer. They get an opportunity to delight themselves for a change. The skills are a great addition but the sense of pride and ownership contributes to team culture.
Do it for technical reasons but enjoy the people benefits.
I kind of like your idea of tunneling through the brambles. It sounds like the gardening equivalent of a service layer.
No gardening please
Analogies can be helpful sometimes, when they are small and the audience is getting something completely new.
The biggest problem implementing devops tat i see and its pretty much the same as noted by Brendan Haire below is the politics - Ops and Dev are very separate and ops has kittens at the thought of automated deployment into prod. Ops and Dev work to different goals - Ops to preserve the environment to ensure continued production use, and dev to install new features (which have the potential side effect of disrupting OPs).
I, like Brendan below in my latest project adopted continuous automated delivery within the dev team and then reverted to the more manual process as we got to the boundaries with Ops. We were able to persuade them to accept some automation in some areas but mostly, Ops is a controlled environment with a tight gateway controlled entry process.
Re: No gardening please
Seriously, I like stories and analogies but I will keep your feedback in mind for any future writing. You're probably right about it being too long.
I think you and Brendan are touching on the heart of Continuous Delivery - the removal of the wall between Dev and Ops. I think the best teams have both of these two roles on the same team. There is a lot of inside politics or poor interpretations of compliance which has led to silos and slow moving, low value manual processes.
I've seen some large, well establish companies succeed. Therefore, I'm personally optimistic we can do better in any environment. Maybe it's the gardener in me that makes me sunny.
Thanks for your feedback, it's greatly appreciated.
Craig Motlin Sep 01, 2014