Experiences from Measuring the DevOps Four Key Metrics: Identifying Areas for Improvement

Measuring the four key metrics of DevOps, which were original specified within the Accelerate book, helped a company to assess the performance of their software delivery process. Continuous observation of these metrics supports decisions on where to invest and guides performance improvements.

Nikolaus Huber, a software architect at Reservix, shared his experiences from measuring the software delivery process of their SaaS product at DevOpsCon Berlin 2021.

The four key metrics are Deployment Frequency (the frequency at which new releases go to production), Lead Time For Changes (the time until a commit goes to production), Mean Time to Restore (the time it takes to resolve a service impairment in production) and Change Failure Rate (the ratio of deployments to production that leads to errors and successful deployments).

These metrics have a strong scientific background, Huber explained:

The four key metrics have been described in the book Accelerate (see the Q&A on Accelerate on InfoQ) and further improved in the State of DevOps Reports. The authors applied cluster analysis to their survey data and found out that these metrics can be used to categorize the performance of the software delivery process. For example: elite performers deploy on demand whereas low performers deploy between once per month and once in six months.

Measuring these metrics can help you to assess the performance of your software delivery process, Huber mentioned. As software delivery performance has a positive impact on your organizational performance, you can also work on improving your software delivery process by improving these metrics.

Huber described what their deployment frequency looks like and what they have learned from analyzing the data:

Between January and May 2021, I observed a mean deployment frequency of 13.04 deployments per day, which is pretty good. Our fully automated deployment process gives us the ability to deploy quickly and on demand. And it’s good to have small batch sizes and integrate frequently. This is why we have such a good deployment frequency.

Measuring the change failure rate turned out to be rather challenging, as Huber explained:

The change failure rate turned out to be the most complicated metric to measure for us since we currently have no link between a deployment and a system impairment. So I had to find workarounds and ended up analyzing the deployment job log and the Git history for keywords like "revert" and "hotfix" to calculate the change failure rate.

Huber mentioned that they need to invest in tool improvements to get a better understanding on which deployment caused an error:

In particular, we need a link between deployment and potential system impairment, e.g. by linking deployment job log and monitoring data, or by logging information about the released version on errors.

InfoQ interviewed Nikolaus Huber about their experience in measuring the software delivery process.

InfoQ: What methods and tools have you used to obtain the four key metrics for the software delivery process of your SaaS product?

Nikolaus Huber: I examined different data sources (JIRA reports, Git history, deployment job logs, incident reports, and Gitlab analytics data) and ended up using simply the Git history and the deployment job log as data sources. I wrote R Markdown scripts to import and analyze the data and calculate the metrics described above. The approach is not yet fully automated (MTTR is calculated manually from incident reports) and needs some tooling improvement, but it’s simple and convenient.

I also had a look at other tools like Gitlab Analytics or the Four Keys Project, but they were not applicable to our situation. The reasons are that with Gitlab, you need to use the full feature set (e.g. the issue tracker) which we don’t, and the Four Keys Project requires dedicated infrastructure which I don’t want to maintain.

InfoQ: What did you learn from measuring the lead time for changes?

Huber: We have a mean lead time for change of 1.25 days, which is a good value (high performer category). Actually, this value was already this good when I started working on measuring these metrics. So far, we haven’t worked on improvements.

Conversely, measuring this value helped us to not slow down the tempo as we also had discussions on whether the tempo was too fast. But comparing our metrics to others gives us the confidence to keep the tempo. If we experience too many errors, then we should rather work on improving the stability metrics (MTTR and Change Failure Rate).

InfoQ: How did you analyze the mean time to restore data? What insights did that bring you?

Huber: Currently, analyzing Mean Time to Restore is difficult since I don’t have monitoring data over a long period of time, which is why I had to fall back on writing incident reports when we have a system outage. So here my insights were that we needed to improve our tooling to get more reliable and automated data analysis.

From the pure metrics perspective, I can say that the root cause of the longest system outages was the infrastructure provider. Here, we have already taken countermeasures by moving services to the cloud. It would be interesting to analyse the effect of these countermeasures on MTTR, which is part of future work.

InfoQ: What have you learned from measuring and analyzing the four key metrics?

Huber: There are a lot of learnings. First of all, measuring software delivery is possible but also more time-consuming than I thought. So if you’re interested only in a quick comparison of your software delivery performance, start by estimating the metrics for your process.

Nevertheless, investing more time into analysing these metrics helped us to get a better understanding of the software delivery process and what is important. For example, for us it is better to invest in stability rather than reducing tempo. This was a very valuable discussion and insight which we would not have if we hadn’t invested in examining these metrics.

And in general, we as a team learned that the software delivery process we have is performing very well and that this is a great success.

InfoQ: What will be the next steps?

Huber: I think it is necessary that we invest in getting a better grip on the stability metrics, e.g. by improving monitoring. And then, from a software delivery process perspective, we should use the learnings from the DevOps Reports to invest in practices that improve stability, e.g. by improving our automated testing capabilities.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter