The closing keynote of the first day of the GOTO Berlin 2015 conference was given by Nicole Forsgren. She explored how high performing organization deploy DevOps and lean management to deliver fast and reliable.
InfoQ interviewed her about why organizations are starting to embrace DevOps methods, how being able to deploy fast can also increase IT stability, what to focus upon when changing the organizational culture to improve performance, how lean management can help to increase the performance, and asked her for advice when organizations want to apply DevOps to increase their performance.
InfoQ: The State of DevOps 2015 report talks about high performing organizations which are able to deliver code fastest and most reliably. Can you explain why these results are interesting, and why organizations are starting to embrace DevOps methods in their software development and delivery processes?
Forsgren: Both the 2014 and 2015 State of DevOps reports show that high performing IT organizations can achieve much more than their counterparts both in terms of throughput -- things like deploys per day and speed of cycle time -- AND stability -- mean time to restore and change success rate. The ability to deliver both throughput and stability in your software delivery is a huge value add, and is something that traditional IT has preached was not possible. That is, we were always told that tradeoffs had to be made in throughput in order to preserve stability, but DevOps promises something better. The data shows that throughput and stability metrics move in tandem — effectively not supporting ITIL claims that tradeoffs should be made in throughput in order to get stability. The pattern of needing to trade throughput for stability simply doesn’t appear in the data. And what does that mean for customers? Increased throughput could mean getting content or new features to market faster, or responding to compliance and regulatory changes faster, all while maintaining increasingly reliable software and infrastructure.
InfoQ: Can you elaborate how being able to deploy fast can also increase IT stability?
Forsgren: Deploying fast comes from fast cycle times (code commit to code deploy) and high deployment frequency, and these are generally outcomes of work that is highly integrated and worked on in very small batches. These small batches allow workers to find and fix problems very quickly, which contributes to stability. It also allows the fix process to move through the system very quickly, which contributes to stability. So you can imagine two scenarios: in the new paradigm, where a developer commits a small change, it is automatically tested, built, and -- if all tests pass -- deployed to production upon approval. In this scenario, most errors are caught early in the process, decreasing the chance of changes failing or interrupting service, and if they do, time to restore service is fast. In the old paradigm, many developers worked on large chunks of code for weeks or months, finally deploying to production. If something fails on deployment in the first scenario, there is only a small piece of code to debug; it is easy to roll back the problem and find a solution. In the second scenario, that very large chunk of code was a huge change to the production environment, so a roll back might not be possible, and if it is, debugging that code is difficult and costly -- this increases the chance that changes introduced to production will fail or interrupt service in some way. And any emergency changes are likely beholden to a very long and slow deployment process, further slowing down time to restore service.
InfoQ: Which are the things that organization should focus upon if they want to change their organizational culture to improve their performance?
Forsgren: The key aspects of a good culture in a DevOps paradigm are information flow and trust. These are particularly important because you are bringing together groups that have not always worked together, and have traditionally had competing goals (i.e., developers tried to maximize content delivery to production, while operations tried to maintain stability -- accomplished by stopping any changes to production). Information flow allows these groups to communicate the how’s and why’s of what they do, and trust allows empathy to grow and fill in any gaps that may exist. Some key practices that are often used to help foster information flow and trust are things like stand ups and blameless post mortems.
InfoQ: can you elaborate how lean management fits into all of this? What are things from lean management that can help to increase the performance?
Forsgren: Wow. I’m not really sure where to even start on this one, because lean management informs so much of the DevOps movement. It’s a huge piece of the definition of IT performance, as high throughput and single piece flow is a central component of lean manufacturing concepts and removing waste. It also shows up in the continuous integration pieces of it with fast feedback being important. It even reveals itself to an extent in the name of the movement itself: DevOps is bringing together Development and IT Operations to work more closely together, perhaps even physically. We see this in lean manufacturing with the use of manufacturing cells that specialize in making something that delivers value to an organization -- they physically rearrange the teams into a closer, tighter unit to minimize handoff costs and optimize communication. The 2015 State of DevOps report investigated additional practices specific to lean, such as WIP limits and visualization, and found they had a significant effect on IT performance.
InfoQ: Which advice do you want to give to organizations when they want to apply DevOps to increase their performance?
Forsgren: When you’re ready to start your DevOps journey, it’s important to remember it’s not just about the automation and tooling; you also need to focus on your processes and your culture. It is also important to capture metrics for your journey: You can’t improve what you don’t measure. Beyond that, selecting the right project is very important. Pick a project that delivers value to the business, that includes players from all key practices (e.g., development, QA, test, operations, security, etc.), and one that can deliver a prototype in less than four months. Target and Nordstrom have been very successful in adopting this approach, using early projects to iterate and learn, expanding their lessons learned to other areas of the business, delivering value along the way.