Preparing for Continuous Delivery in the Enterprise
Continuous Delivery as a software delivery strategy is attracting increasing attention and recognition within both IT and the business organization.
The ability to rapidly and repeatedly bring service improvements to market that Continuous Delivery provides, aligns naturally with business initiatives to accelerate time-to-market in today’s competitive economic environment, while maintaining quality. Frequent, incremental changes also help meet the expectations of today’s “always-on” consumer, whose work and leisure experience of IT services is increasingly based on simple, one-click installation of applications that update automatically.
Virtualization, private cloud and DevOps initiatives are ongoing in many organizations, providing the technical foundations for Continuous Delivery. Combined with competitive pressure created by the growing, successful adoption by industry leaders, it is no surprise that surveys show that Continuous Delivery implementation is one of the current key initiatives for many enterprises.
Where to Start?
As a concept, Continuous Delivery has been around for a while, and has indeed been practiced by the front runners in this field for many years. As a result, there are plenty of books and other references describing the principles and practices of Continuous Delivery, as well as sketching out in detail what the “goal state” can look like.
These materials can provide an important theoretical grounding and help you develop your Continuous Delivery vision. It is in the area of how to get started to realize that vision, especially in the context of the existing development and release environment in a large enterprise, that additional guidance is needed.
Our experience of helping clients introduce Continuous Delivery and related automation, and partnering with other leading Continuous Delivery experts such as ThoughtWorks, can help. We will describe important steps to preparing a structured approach to a Continuous Delivery implementation that takes you from where you are today to the realization of your vision in well-defined, measurable phases.
Before You Implement: Identifying Potential Challenges
As in any practice-oriented implementation, a clear picture of the current situation is important. Awareness of your “baseline” and resulting challenges to the implementation are a prerequisite for a successful Continuous Delivery implementation.
Based on our experience of helping enterprises both large and small introduce Continuous Delivery automation, we can describe a number of factors that can impact a Continuous Delivery implementation.
Not all of these factors will be relevant to your scenario. For those that you do recognize, an action item to identify the scope of the challenge, its potential impact on your delivery goals, and possible mitigation steps, should be part of your project plan.
1. Large, monolithic applications
A key aspect of Continuous Delivery is making small, incremental changes to your applications and services. These are simpler to test and also, since only a small part of the application changes each time, make pinpointing and remedying the source of errors and other problems such as slowdowns much easier.
Since each change has a much smaller impact on your target environments than a typical “big bang” release, it can usually be pushed through your pipeline more quickly, too. This leads to faster pipeline runs, shorter dev-to-prod cycles and, thus, steadier feature throughput.
Large, tightly-coupled applications in which many components need to be compiled, tested and deployed together are hard to update incrementally, leading to long development, test and deploy cycles. Quality control and especially root cause analysis is harder, too, since many changes are being implemented at the same time.
The large number of changes and components that need to be modified mean that, typically, each release procedure needs to differ slightly from the previous ones. This makes it hard to create a standardized delivery pipeline and benefit from the resulting increase in reliability.
As with any iterative process, improvement through fast feedback is a key part of a successful Continuous Delivery setup. Drawing effective conclusions from one enormous “chunk” of feedback after a monolithic release is hard as there are so many changes and factors unique to this release to consider.
Smaller, faster, more standardized pipeline runs greatly simplify the feedback and improvement cycle.
The Story of the Big App
As part of moving to a more agile development methodology with shorter iterations, a large insurance organization started building a delivery pipeline for a claims processing platform. The platform consisted of a single application based on an internally-developed framework and took over an hour for a full compile and unit test run.
Initially, environment-specific endpoints in the source code required the application to be built separately for each environment, so that a build-functional test-integration test cycle required almost four hours just to build the deliverables!
As a first step, these environment-dependent values were externalized, allowing (in accordance with Continuous Delivery principles) just one deliverable to be built. Weighing in at more than 1Gb, this artifact still took more than 20min simply to be copied to the next pipeline stage, with a further 30min for deployment and processing by the target middleware.
As a result, developers ended up committing changes more quickly than the pipeline could handle. In order to deal with this, the pipeline was throttled to trigger only once per day. This alleviated the bottleneck but meant that failures in the pipeline could no longer be related to individual changes, making it harder to fix errors quickly.
In order to address this challenge, a workstream was initiated to incrementally break out components of the application into separate modules which could be built and deployed independently, allowing for faster feedback cycles with smaller changesets.
2. Low levels of automation
If you review the various activities that are currently required in your environment to transition a new application version from development to production and identify many manual steps, you may need to consider ways of increasing the level of automation in your pipeline.
It’s not that we should be automating for the sake of automation: manual activities aren’t “banned” from a Continuous Delivery pipeline on principle. Experience simply shows that humans tend be both slower and less accurate at the kind of repetitive tasks that make up the bulk of a delivery pipeline.
A high percentage of manual steps will thus likely prevent you from being able to scale your Continuous Delivery implementation to the desired number of pipeline runs. In order to meet your throughput and consistency goals, it is usually required to either automate many currently manual steps in your delivery process, or remove certain steps from the process altogether if suitable alternatives are available.
It is important to treat this automation effort as seriously as any other development effort, applying appropriate design, coding and testing practices in order to avoid ending up with an impossible to maintain “ball of mud”. The Infrastructure as Code movement has made significant steps in this area, for instance promoting test-driven development of provisioning and deployment automation and providing supporting tooling.
3. Contended environments
If your organization currently works with a limited pool of shared test environments, there is a risk that you will quickly run into a bottleneck during your Continuous Delivery implementation.
Firstly, the ability to “block” or “reserve” an environment becomes necessary if your delivery pipelines trigger on code changes: two pipelines running side-by-side need to be prevented from attempting to deploy and test in the same environment. Measures also need to be taken to prevent one pipeline blocking an environment for too long, or for one pipeline to always just beat the other to the required environment, leading to “starvation” for the other project.
Furthermore, an interesting datapoint from the aforementioned survey is that misconfigured or “broken” environments that have been unexpectedly modified by previous teams or test runs are one of the leading causes of deployment failures. Even if your environment pool is sufficiently large to avoid the starvation problem, regular pipeline failures due to misconfigured environments will also limit your ability to reliably deliver new features.
If you plan to be running delivery pipelines at scale, a dynamic pool of available, “clean” target environments is required. Private, public or hybrid cloud platforms, coupled with provisioning and configuration management tools, allow you to grow and shrink this pool automatically and on-demand.
The Story of the Limited Environments
After a significant push to develop automated tests, a retailer implemented a publishing pipeline for a large customer-facing website. Next to accelerated releases of new content, another anticipated benefit was improved utilization of the three test environments which, due to a complex integration of the CMS, webservers and a number of external endpoints, had been complicated and expensive to set up. In order to prevent environment conflicts, a round-robin system had been implemented, with subsequent pipeline runs blocking until the next test environment became available.
With content changes arriving frequently, it quickly transpired that one suite of tests verifying long-running buying sessions caused the pipelines to back up and eventually overload the orchestrator. First a hard timeout was added to allow the pipeline to resume. Subsequently, tests started behaving erratically, failing in some runs but then succeeding immediately afterwards, without obvious cause.
On investigation, the hard timeout was disrupting the database cleanup procedure, leading to corrupted environments. To prevent this, the long-running test suite was simply disabled, lowering test coverage. After a high-profile glitch, the team “bit the bullet” and set up automated provisioning for their test environment, including development of mocked versions for external endpoints.
Automated test environment setup has since been a requirement for all test automation projects at the organization.
4. Release Management requirements
The canonical delivery pipeline diagram, with its stages and feedbacks all the way through to production, looks temptingly straightforward. In practice, as soon as we approach QA or production in most enterprise environments, an increasing number of release management requirements must be met: creation of a change ticket, placing the change on the agenda of the next Change Board meeting, receiving Change Board approval, confirming deployment windows etc. etc.
How to integrate such requirements into our delivery pipelines is an important question that Continuous Delivery implementations in enterprises need to address. One option is to simply cap all delivery pipelines at the test stage, i.e. before we run into any release management conditions. The goal is typically to take Continuous Delivery further than just test environments, though.
Can the various release management steps be integrated into the pipeline, e.g. by manually and, eventually, automatically creating and scheduling a change ticket, or by automatically setting a start time on the pipeline’s deployment phase from the change management system?
Does the possibility exist of revisiting the need for certain change management conditions in the first place? The origin of most change management practices is based on providing assurance that only changes of an approved level of quality and stability make it to production – precisely the level of quality and stability that prior stages of a delivery pipeline are intended to verify.
Experience from well-known examples of organizations proficient in Continuous Delivery, such as Netflix, Etsy and others, indicates that quality, traceability and reliability of releases can be achieved using pipelines, without the need for heavyweight change management processes.
5. Scaling up jobs
In a large organization with a diverse service portfolio, there will be many pipelines to manage as you scale your Continuous Delivery implementation. Your service portfolio likely spans different technology platforms, different departments, different internal and external customers, different development and support teams etc.
If each application defines its own custom pipeline, how will that affect management and measurement of your Continuous Delivery implementation as a whole? If every pipeline ends at a different stage in the delivery process, how can metrics such as cycle time, throughput or percentage of successful executions be compared?
A large set of pipelines is easier to manage if each one is based on a standard “template”. Templates can be as simple as a shared Wiki page but are also supported by common pipeline orchestrators such as Jenkins (Templates Plugin), Go (Pipeline Templates) and TFS (Build Process Templates). Standardized pipelines also allow for more meaningful comparative reporting as well as enabling lessons learned in one pipeline to be applied to many others – improvement through feedback being a key component of Continuous Delivery.
How many templates you should start with depends on the variation across your service portfolio; one per technology stack is often a useful starting point. Over time, you will hopefully eventually be able to consolidate towards just a handful of pipeline types.
6. Job ownership and security
When everything is running smoothly, it is easy to forget that automated delivery pipelines are processes spanning many parts of the IT organization, from development through testing to production.
When pipeline stages fail, though, it is essential that clear responsibilities are in place to fix things and get the delivery stream running again. Every pipeline stage should have an owner, champion or responsible person/team, which is tasked not only with fixing problems but also contributing to feedback-driven improvement of the pipeline as a whole.
Since visibility into the state of the entire pipeline is important for all stakeholders, not just the owners of the individual stages, it is important that any orchestration tool considered offers a suitable security model. For example, developers will probably need to examine the results of a functional test phase to help identify the cause of test failures. They should not be able to disable or modify the configuration of the functional testing step, however.
The Story of the Orphaned Pipeline
Planning a greenfield project to develop a mobile application and corresponding backend to allow customers to customize their vehicle, a car manufacturer decided to start with Continuous Delivery from the outset. A couple of experienced Build & Release engineers were hired as consultants to set up the pipelines, the project successfully went live and the consultants moved on, job well done.
For a while, everything went smoothly. Then, one of the mobile platform branches suddenly failed. Cue angry calls and emails from business owners, and a quick fix: the failing pipeline stage was simply “cut out” and bypassed. The (only partially tested) application appeared in the store again, and the missing stage was quickly forgotten.
Over time, similar emergencies resulted in more and more segments of the pipeline being disabled or reduced in scope. Small bugs appeared in the application with increasing frequency, but without an owner and with the original creators gone, restoring the pipeline to its original state was on nobody’s radar.
A new member of the development team, coming from a company with a strong Continuous Delivery mindset, wanted to improve the situation and informally “adopted” the pipeline. As a developer, she did not have permissions to change the QA stages of the pipeline herself, so she tracked down the QA team members who had originally worked with the team.
Two QA engineers in particular were happy to see their original efforts restored to working condition, but with a backlog of QA work for other projects, they were only able to help very sporadically.
None of the required fixes was very complicated in itself: archiving of test results to a Wiki was broken due to API change, the location for automated publishing of documentation had moved etc., but without an official project and associated billing code, the QA manager was not willing to let his busy team take time to tackle these issues. In the end, only an escalation to the VP Engineering and continued efforts from the developer and QA team members to finally get a pipeline maintenance project approved.
Such an ongoing project is now created as standard for Continuous Delivery implementations at the company.
Driven by market and customer pressure, market leaders have implemented Continuous Delivery as a strategy to accelerate time-to-market, and leading organizations are looking to follow. While much has been published on the principles and processes of Continuous Delivery, practical advice on how to approach and plan a Continuous Delivery implementation in an enterprise environment is hard to come by.
Analyzing which of the common challenges to Continuous Delivery apply in your situation should be a first preparatory step in your implementation. Mitigating any challenges that you identify early in the project cycle should help your implementation progress smoothly. Experience of addressing these issues will also prepare you to effectively address the next set of challenges you will encounter as the implementation progresses.
By gaining an accurate picture of your current baseline and structuring your implementation in measurable phases, you can commence by addressing these challenges to clear the way for your first delivery pipelines with defined roles and responsibilities for each phase.
Your Continuous Delivery implementation will then be on the way to providing faster releases, more reliable feature delivery and steady improvement driven by quicker feedback and better insight.
About the Author
Andrew Phillips is VP of Products for XebiaLabs, providers of application delivery automation solutions. Andrew is a cloud, service delivery and automation expert and has been part of the shift to more automated application delivery platforms. In his spare time as a developer, he worked on Multiverse, the open-source STM implementation, contributes to Apache jclouds, the leading cloud library and co-maintains the Scala Puzzlers site.
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin May 21, 2015
Kai Kreuzer, Olaf Weinmann May 21, 2015