Key Takeaways
- We saw negative outcomes from agile scaling frameworks. Focusing on "Why can’t we deliver today’s work today?" forced us to find and fix the technical and process problems that prevented agility.
- We couldn’t deliver better by only changing how development was done. We had to restructure the organization and the application.
- The ability to deliver daily improved business delivery and team morale. It’s a more humane way to work.
- Optimize for operations and always use your emergency process to deliver everything. This ensures hotfixes are safe while also driving improvement in feature delivery.
- If you want to retain the improvement and the people who did it, measure it to show evidence when management changes, or you might lose it and the people.
Struggling with your "agile transformation?" Is your scaling framework not providing the outcomes you hoped for? In this article, we’ll discuss how teams in a large enterprise replaced heavy agile processes with Conway’s Law and better engineering to migrate from quarterly to daily value delivery to the end users.
Replacing Agile with Engineering
We had a problem. After years of "Agile Transformation" followed by rolling out the Scaled Agile Framework, we were not delivering any better. In fact, we delivered less frequently with bigger failures than when we had no defined processes. We had to find a way to deliver better value to the business. SAFe, with all of the ceremonies and process overhead that comes with it, wasn’t getting that done. Our VP read The Phoenix Project, got inspired, and asked the senior engineers in the area to solve the problem. We became agile by making the engineering changes required to implement Continuous Delivery (CD).
Initially, our lead time to deliver a new capability to the business averaged 12 months, from request to delivery. We had to fix that. The main problem, though, is that creating a PI plan for a quarter, executing it in sprints, and delivering whatever passes tests at the end of the quarter ignores the entire reason for agile product development: uncertainty.
Here is the reality: the requirements are wrong, we will misunderstand them during implementation, or the end users’ needs will change before we deliver. One of those is always true. We need to mitigate that with smaller changes and faster feedback to more rapidly identify what’s wrong and change course. Sometimes, we may even decide to abandon the idea entirely. The only way we can do this is to become more efficient and reduce the delivery cost. That requires focusing on everything regarding how we deliver and engineering better ways of doing that.
Why Continuous Delivery?
We wanted the ability to deliver more frequently than 3-4 times per year. We believed that if we took the principles and practices described in Continuous Delivery by Jez Humble and Dave Farley seriously, we’d be able to improve our delivery cadence, possibly even push every code commit directly to production. That was an exciting idea to us as developers, especially considering the heavy process we wanted to replace.
When we began, the minimum time to deliver a normal change was three days. It didn’t matter if it was a one-line change to modify a label or a month’s worth of work -- the manual change control process required at least three days. In practice, it was much worse. Since the teams were organized into feature teams and the system was tightly coupled, the entire massive system had to be tested and released as a single delivery. So, today’s one-line change will be delivered, hopefully, in the next quarterly release unless you miss merge week.
We knew if we could fix this, we could find out faster if we had quality problems, the problems would be smaller and easier to find, and we’d be able to add regression tests to the pipeline to prevent re-occurrence and move quickly to the next thing to deliver. When we got there, it was true. However, we got something more.
We didn’t expect how much better it would be for the people doing the work. I didn’t expect it to change my entire outlook on the work. When you don’t see your work used, it’s joyless. When you can try something, deliver it, and get rapid feedback, it brings joy back to development, even more so when you’ve improved your test suite to the point where you don’t fear every keystroke. Getting into a CD workflow made me intolerant of working the way we were before. I feel process waste as pain. I won’t "test it later when we get time." I won’t work that way ever again. Work shouldn’t suck.
Descale and Decouple for Speed
We knew we’d never be able to reach our goals without changing the system we were delivering. It was truly monstrous. It was the outcome of taking three related legacy systems and a fourth unrelated legacy system and merging them, with some splashy new user interfaces, into a bigger legacy system. A couple of years before this improvement effort, my manager asked how many lines of code the system was. Without comments, it was 25 million lines of executable code. Calling the architecture "spaghetti" would be a dire insult to pasta. Where there were web services, the service boundaries were defined by how big the service was. When it got "too big," a new service, Service040, for example, would be created.
We needed to break it up to make it easier to deliver and modernize the tech stack. Step one was using Domain Driven Design to start untangling the business capabilities in the current system. We aimed to define specific capabilities and assign each to a product team. We knew about Conway’s Law, so we decided that if we were going to get the system architecture we needed, we needed to organize the teams to mirror that architecture. Today, people call that the "reverse Conway maneuver." We didn’t know it had a name. I’ve heard people say it doesn’t work. They are wrong. We got the system architecture we wanted by starting with the team structure and assigning each a product sub-domain. The internal architecture of each team’s domain was up to them. However, they were also encouraged to use and taught how to design small services for the sub-domains of their product.
We also wanted to ensure every team could deliver without the need to coordinate delivery with any other team. Part of that was how we defined the teams’ capabilities, but having the teams focus on Contract Driven Development (CDD) and evolutionary coding was critical. CDD is the process where teams with dependencies collaborate on API contract changes and then validate they can communicate with that new contract before they begin implementing the behaviors. This makes integration the first thing tested, usually within a few days of the discussion. Also important is how the changes are coded.
The consumer needs to write their component in a way that allows their new feature to be tested and delivered with the provider’s new contract but not activated until that contract is ready to be consumed. The provider needs to make changes that do not break the existing contract. Working this way, the consumer or provider can deliver their changes in any order. When both are in place, the new feature is ready to release to the end user.
By deliberately architecting product boundaries, the teams building each product, focusing on evolutionary coding techniques, and "contract first" delivery, we enabled each team to run as fast as possible. SAFe handles dependencies with release trains and PI planning meetings. We handled them with code. For example, if we had a feature that also required another team to implement a change, we could deploy our change and include logic that would activate our feature when their feature was delivered. We could do that either with a configuration change or, depending on the feature, simply have our code recognize the new properties in the contract were available and activate automatically.
Accelerating Forces Learning
It took us about 18 months after forming the first pilot product teams to get the initial teams to daily delivery. I learned from doing CD in the real world that you are not agile without CD. How can you claim to be agile if it takes two or more weeks to validate an idea? You’re emotionally invested by then and have spent too much money to let the idea go.
You cannot execute CD without continuous integration (CI). Because we took CI seriously, we needed to make sure that all of the tests required to validate a change were part of the commit for that change. We had to test during development. However, we were blocked by vague requirements. Focusing on CI pushed the team to understand the business needs and relentlessly remove uncertainty from acceptance criteria.
On my team, we decided that if we needed to debate story points, it was too big and had too much uncertainty to test during development. If we could not agree that anyone on the team could complete something in two days or less, we decomposed the work until we agreed. By doing this, we had the clarity we needed to stop doing exploratory development and hoping that was what was being asked for. Because we were using Behavior Driven Development (BDD) to define the work, we also had a more direct path from requirement to acceptance tests. Then, we just had to code the tests and the feature and run them down the pipeline.
You need to dig deep into quality engineering to be competent at CD. Since the CD pipeline should be the only method for determining if something meets our definition of "releasable," a culture of continuous quality needs to be built. That means we are not simply creating unit tests. We are looking at every step, starting with product discovery, to find ways to validate the outcomes of that step. We are designing fast and efficient test suites. We are using techniques like BDD to validate that the requirements are clear. Testing becomes the job. Development flows from that.
This also takes time for the team to learn, and the best thing to do is find people competent at discovery to help them design better tests. QA professionals who think, "What could go wrong?" and help teams create strategies to detect that, instead of the vast majority who are trained to write test automation, are gold. However, under no circumstances should QA be developing the tests because they become a constraint rather than a force multiplier. CD can’t work that way.
The most important thing I learned was that it’s a more humane way of working. There’s less stress, more teamwork, less fear of breaking something, and much more certainty that we are probably building the right thing. CD is the tool for building high-performing teams.
Optimize for Operations
All pipelines should be designed for operations first. Life is uncertain -- production breaks. We need the ability to fix things quickly without throwing gasoline on a dumpster fire. I carried a support pager for 20 years. The one thing that was true for most of that time was that we always had some workaround process for delivering things in an emergency. This means that the handoffs we had for testing for normal changes were bypassed for an emergency. Then, we would be in a dumpster fire, hoping our bucket contained water and not gasoline.
With CD, that’s not allowed. We have precisely one process to deliver any change: the pipeline. The pipeline should be deterministic and contain all the validations to certify that an artifact meets our definition of "releasable." Since, as a principle, we never bypass or deactivate quality gates in the pipeline for emergencies, we must design good tests for all of our acceptance criteria and continue to refine them to be fast, efficient, and effective as we learn more about possible failure conditions. This ensures hotfixes are safe while also driving improvement in feature delivery. This takes time, and the best thing to do is to define all of the acceptance criteria and measure how long it takes to complete them all, even the manual steps. Then, use the cycle time of each manual process as a roadmap for what to automate next.
What we did was focus on the duration of our pipeline and ensure we were testing everything required to deliver our product. We, the developers, took over all of the test automation. This took a lot of conversation with our Quality Engineering area since the pattern before was for them to write all the tests. However, we convinced them to let us try our way. The results proved that our way was better. We no longer had tests getting out of sync with the code, the tests ran faster, and they were far less likely to return false positives. We trusted our tests more every day, and they proved their value later on when the team was put under extreme stress by an expanding scope, shrinking timeline, "critical project."
Another critical design consideration was that we needed to validate that each component was deliverable without integrating the entire system. Using E2E tests for acceptance testing is a common but flawed practice. If we execute DDD correctly, then any need to do E2E for acceptance testing can be viewed as an architecture defect. They also harm our ability to address impacting incidents.
For example, if done with live services, one of the tests we needed to run required creating a dummy purchase order in another system, flowing that PO through the upstream supply chain systems, processing that PO with the legacy system we were breaking apart, and then running our test. Each test run required around four hours. That’s a way to validate that our acceptance tests are valid occasionally, but not a good way to do acceptance testing, especially not during an emergency. Instead, we created a virtual service. That service could return a mock response when we sent a test header so we could validate we were integrating correctly. That test required milliseconds to execute rather than hours.
We could run it every time we made a change to the trunk (multiple times per day) and have a high level of confidence that we didn’t break anything. That test also prevented a problem from becoming a bigger problem. My team ran our pipeline, and that test failed. The other team had accidentally broken their contract, and the test caught it within minutes of it happening and before that break could flow to production. Our focus on DDD resulted in faster, more reliable tests than any of the attempts at E2E testing that the testing area attempted. Because of that, CD made operations more robust.
Engineering Trumps Scaling Frameworks
We loved development again when we were able to get CD off the ground. Delivery is addictive, and the more frequently we can do that, the faster we learn. Relentlessly overcoming the problems that prevent frequent delivery also lowers process overhead and the cost of change. That, in turn, makes it economically more attractive to try new ideas and get feedback instead of just hoping your investment returns results. You don’t need SAFe’s PI planning when you have product roadmaps and teams that handle dependencies with code. PI plans are static.
Roadmaps adjust from what we learn after delivering. Spending two days planning how to manage dependencies with the process and keeping teams in lock-step means every team delivers at the pace of the slowest team. If we decouple and descale, teams are unchained. Costs decrease. Feedback loops accelerate. People are happier. All of these are better for the bottom line.
On the first team where we implemented CD, we improved our delivery cadence from monthly (or less) to several times per day. We had removed so much friction from the process that we could get ideas from the users, decompose them, develop them, and deliver them within 48 hours. Smaller tweaks could take less than a couple of hours from when we received the idea from the field. That feedback loop raised the quality and enjoyment level for us and our end users.
Measure the Flow!
Metrics is a deep topic and one I talk about frequently. One big mistake we made was not measuring the impact of our improvements on the business. When management changed, we didn’t have a way to show what we were doing was better. To be frank, the new management had other ideas – poorly educated ideas. Things degraded, and the best people left. Since then, I’ve become a bit obsessed with measuring things correctly.
For a team wanting to get closer to CD, focus on a few things first:
- How frequently are we integrating code into the trunk? For CI, this should be at least once per day per team member on average. Anything less is not CI. CI is a forcing function for learning to break changes into small, deliverable pieces.
- How long does it take for us, as a team, to deliver a story? We want this to be two days maximum. Tracking this and keeping it small forces us to get into the details and reduce uncertainty. It also makes it easy for us to forecast delivery and identify when something is trending later than planned. Keep things small.
- How long does it take for a change to reach the end of our pipeline? Pipeline cycle time is a critical quality feedback loop and needs to keep improving.
- How many defects are reported week to week? It doesn’t matter if they are implementation errors, "I didn’t expect it to work this way," or "I don’t like this color." Treat them all the same. They all indicate some failure in our quality process. Quality starts with the idea, not with coding.
Since this journey, I’ve become passionate about continuous delivery as a forcing function for quality. I’ve seen on multiple teams in multiple organizations what a positive impact it has on everything about the work and the outcomes. As a community, we have also seen many organizations not take a holistic approach, throw tools at the problem, ignore the fact that this is a quality initiative, and hurt themselves. It’s important that people understand the principles and recommended practices before diving in head first.
You won’t be agile by focusing on agile frameworks. Agility requires changing everything we do, beginning with engineering our systems for lower delivery friction. By asking ourselves, "Why can’t we deliver today’s work today?" and then relentlessly solving those problems, we improve everything about how we work as an organization. Deploy more and sleep better.