BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Improving Speed and Stability of Software Delivery Simultaneously at Siemens Healthineers

Improving Speed and Stability of Software Delivery Simultaneously at Siemens Healthineers

Bookmarks

Key Takeaways

  • At Siemens Healthineers teamplay digital health platform an organizational and sociotechnical transformation towards faster and more stable software delivery has been driven in a highly regulated medical domain. Both speed and stability of delivery improved at the same time. 
  • Driving the software delivery transformation in a larger software delivery organization operating in a regulated domain is going to take a significant amount of time because it requires fundamental changes to the regulatory quality management system. 
  • Measuring speed and stability of software delivery during organizational transformation towards faster delivery provides a valuable set of data-driven goals to work against and assess progress. 
  • The book “Accelerate” provides solid and popular research on successful software delivery practices based on a survey of about 400 software delivery organizations. It identifies clusters of low and high performing organizations based on speed and stability of software delivery. 
  • The identified clusters of organizations from “Accelerate” are the same for speed and stability. That is, speed and stability go together when it comes to software delivery. Fast delivery is stable. Slow delivery is unstable. This was confirmed by our transformation: speed and stability improved at the same time. 

Driving a software delivery transformation in the healthcare domain

In this article, we focus on the software delivery process at Siemens Healthineers Digital Health. The process is subject to strict regulations valid in the medical industry. We show our journey of transforming the process towards speed and stability. Both measures improved at the same time during the transformation, confirming research from the "Accelerate" book.

Domain

Siemens Healthineers is a medical technology company with the purpose of driving innovation to help humans live healthier and longer. Within Siemens Healthineers, the teamplay digital health platform is the enabler of digital transformation for medical institutions with the goal of turning data into cost savings and better care. The platform provides easy access to solutions for operational, clinical and shared decision support. It provides a secured and regulatory compliant environment for integrating digital solutions into clinical routines fostering cross-departmental and cross-institutional interoperability. Moreover, the platform provides access to transformative and AI-powered applications for data-driven decision support - from Siemens Healthineers and curated partners.

To date, there are more than 6.500 institutions and 32.000 systems from 75 countries connected to the platform. This makes more than 30 millions patient records accessible across institutions. The platform is open for SaaS and PaaS partners alike. SaaS partners make their existing applications available through the teamplay digital marketplace. PaaS partners develop new applications and services leveraging teamplay APIs.

The teamplay platform is cloud-based. It is built on top of Microsoft Azure, with privacy and security by design and default. The speed and stability of software delivery are central to teamplay. In 2015, the speed and stability were insufficient. With this insight, the transformation of the software delivery process at teamplay began the same year. The goal of the transformation was to make the software delivery faster and more stable. In order to achieve the goal, a large number of people, process, technology and regulatory changes were implemented over the years.

Transformation roadmap

As part of the transformation process, a whole host of new methodologies were introduced: HDD, BDD, TDD, user story mapping, pairing, independent deployment pipelines, Test DSL, SRE and Kanban. These are described in detail in a previous InfoQ article, “Adopting Continuous Delivery at teamplay, Siemens Healthineers”. The adoption and “stickiness” of the methodologies differed by team. The following picture maps the major milestones of the transformation over time.

In 2015 the need for transformation became apparent. As a nascent platform in the enterprise, delivered based on the enterprise-wide regulatory quality management system (QMS) for both hardware and software products, we were challenged by the product speed and stability demand we could not meet. The product owners were entering the digital services market totally new to the company at the time. There was no knowledge available as to which services would resonate with the users, which ones the users would be willing to pay for and which feature sets would be most valuable. Thus, the need for fast experimentation with ideas turned into software was high. Fortnightly or monthly software releases, and immediate hotfixes on-demand, would be welcome by the product owners. This was far from the software delivery the organization was set up for doing. It was obvious that the changes to the QMS would require significant expertise from the regulatory department. We started a long-term initiative towards making the QMS more lean. Internally in R&D, we increased our emphasis on automated testing.

In 2016, we started the BDD movement. This was done as part of the automated testing improvements. It had a broad impact on requirement specification, automated testing, test implementation, test reporting and understandability of test results by all roles. Whereas in the past requirements were big, the introduction of BDD forced the product owners to break them down into rather small user stories. Each user story started being broken down even further by the entire team into a set of small BDD scenarios (specification by example using the Given / When / Then statements). The teams welcomed these changes as they addressed a long-term developer concern that the requirements were too big and bulky to implement in a short time frame. Smaller requirements led to smaller automated tests. Smaller automated tests led to more stable automation. Despite these great and necessary improvements, the overall speed of transformation was rather slow. In terms of QMS changes, we performed an analysis of how the reduction of the number of roles, deliverables, activities and workflow breaks could be done while still maintaining the required regulatory compliance.

In 2017, we brought in Continuous Delivery consultants to speed up the transformation. Dave Farley from Continuous Delivery Ltd. provided strategic consulting as well as training for managers, product owners, architects and developers. Many consultants from Equal Experts Ltd. worked alongside our product owners, architects and developers at all locations in order to jointly deliver features using many methods and techniques new to our teams. Specifically, the application of BDD, TDD, user story mapping and pair programming was the focus during the consulting activities. By co-working with our teams, the consultants showed our developers, architects and product owners firsthand how to work in new ways, implement independent deployment pipelines, put in place initial observability, etc. In addition, we brought in medical QMS consultants from Johner Institute GmbH to discuss our analysis of QMS changes and confirm that the changes could be done while preserving regulatory compliance.

In 2018, we continued working with consultants adopting the Continuous Delivery ways of working in a deeper manner. This time it was not about introducing new methods, but rather ingraining the methods introduced before into the daily lives of teams and team members on a sustainable basis. In the spirit of the Japanese martial art concept Shu-Ha-Ri that describes three stages of learning on the path to mastery (Shu - follow the master, Ha - learn from other masters and refine your practice, Ri - come up with your own techniques), we transitioned from the Shu to Ha stage of learning. The goal was to embed the new ways of working to the point where the involvement of consultants would no longer be necessary to sustain the new practice. We reached a stage where Continuous Delivery ways of working became the standard for all new digital health products. On the regulatory side of the transformation, we brought in the BDD-based requirement engineering officially into the regulatory QMS.

In 2019, we made the first QMS release that enabled Continuous Delivery ways of working in the teams. The tools for QMS were released alongside. For requirement engineering, we validated the product “Modern Requirements for Azure DevOps” in a formal way using a validation plan and associated tests. It streamlined the requirement baselining, requirement review process and traceability of requirements.

For regulatory reporting purposes, we implemented our own tool dubbed “QTracer”. Also, this tool got validated in a formal way using a validation plan and associated tests. The combination of the new QMS and associated tooling enabled the teams to make regulatory-compliant releases more efficiently with reduced regulatory overhead.

The first signs of the overall impact of the transformation were observed in the stability of delivery. The production deployment failure rate for all deployments done in the year fell by a factor of 2 compared to the previous year.

In 2020, the breakthrough of the transformation became possible. The production deployment lead time for all deployments done in the year was reduced by a factor of 2,4 compared to the previous year. At the same time, the production deployment failure rate for all deployments done in the year fell by a factor of 1,2 compared to the previous year. More details and corresponding graphs are available later in the section “Speed and stability improve together”.

In 2021, joint improvements of speed and stability of software delivery continued. The production deployment lead time for all deployments done in the year so far got reduced by the factor of 2,1 compared to the previous year. At the same time, the production deployment failure rate for all deployments done in the year so far fell by a factor of 1,7 compared to the previous year. More details and corresponding graphs are available later in the section “Speed and stability improve together”.

Easy transformation wins

Although the transformation has been a long and difficult process, there were some easy wins along the way. These are listed in the table below.

 

Change

Explanation

From Scrum to Kanban process

Before the transformation, our teams were required to work according to the Scrum process. At some point during the transformation, the teams were given the choice of selecting the Scrum or Kanban process. Within a few weeks, the majority of our teams voluntarily switched to Kanban. The teams enjoyed the freedom provided by the Kanban process: just-in-time backlog item grooming, just-in-time functionality demonstration when the work on a backlog item got finished, and any time prioritization of the backlog by the product owner except for the backlog items currently in work. Kanban is used by most of our teams to this day.

From big requirements to user story identification using user story mapping

A long-standing concern of developers was that requirements coming to the teams were too big. They took a long time to implement and were difficult to test. The introduction of user story mapping addressed this concern. It provided the teams with a structured way to break down big requirements into small user stories. Additionally, it enabled all the team members to be part of the user journey, as well as release planning, and discussions from the beginning. The teams welcomed the methodology and mastered it over time. Today, the user story mapping is the default method for breaking down requirements at teamplay.

From big requirements to BDD scenario specifications by product owners

Breaking down requirements into user stories is done using the user story mapping. Further breakdown of user stories can be done using BDD scenarios. This change was welcomed by the product owners as it allowed them to convey to the developers what they wanted to get implemented using examples. This is the standard practice in teamplay teams to this day. What remains challenging is the involvement of the entire team in the definition of BDD scenarios. This is very important to get a set of scenarios from different angles: functional, operational, security, performance, data protection, regulatory, etc. The richer the set of scenarios, the deeper the understanding of the user story by the team and the greater the test coverage, resulting in a better quality for the users.

From a giant pipeline deploying several products to the idea of having an independent deployment pipeline per product

When strategizing for Continuous Delivery at teamplay, we envisioned each product to be independently releasable. Therefore, an independent deployment pipeline per product was necessary to be implemented. This idea caught on very fast because the teams were suffering a lot from a giant pipeline deploying all products together only once a day. Being able to deploy independently became a movement in the organization. However, the implementation of the idea was challenging as the teams lacked the knowledge and experience of implementing independent deployment pipelines. This needed to be built over time. Today all new products are equipped with an independent deployment pipeline from the beginning.

From knowledge sharing sessions to using pairing as a means to share knowledge

At some point during the transformation the teams were given the freedom, and coaching, to try out pair programming. The practice caught on gradually. Today, pairing is a primary way of sharing developer knowledge, onboarding new developers and implementing challenging parts of the system. The practice did not catch on as a general way of doing programming.

Major transformation challenges

During the transformation, some major challenges were encountered, mitigated or addressed. These are presented in the table below.

 

Challenge

Explanation

Mitigation

Changing the regulatory relevant Quality Management System (QMS)

A regulatory QMS in an enterprise grows over the yearsbased on the changes in statutory laws, regulations and audit findings from all relevant geographies. This leads to the QMS containing requirements to be fulfilled by the teams whose origins are difficult to track back. Additionally, any QMS change can only be done with audit-proof explanations of the reasons and proof that the resulting process is fulfilling the laws and regulations in an equivalent way. Taken together, these aspects lead to the QMS managers being reluctant to make QMS changes due to the fear of exposing the organization to new audit findings.

We tried the following to mitigate the challenge:

  • Taking the QMS managers for visits to companies that implemented Continuous Delivery despite operating in regulated industries

  • Hiring consultants experienced in medical device regulations to co-design QMS changes with our QMS managers

  • Educating the QMS managers in Continuous Delivery principles and methods

Changing people’s mindsets 

The transformation required a thorough rethink of all software
delivery aspects from all roles in the organization. As many people, teams and groups were changing many technical, organizational and process aspects at the same time, it proved challenging toavoid intransparency, ambiguity and obscurity of changes.

We tried the following to mitigate the challenge:

  • Providing a high level roadmap for changes

  • Explaining the benefits of changes and the required investments

  • Having one-on-ones with selected people

  • Providing coaching to selected people

Demonstrating quick wins of transformation

Outcomes of transformation recognizable by the people not actively participating in the ongoing changes (e.g. increasing the speed of delivery) take a long time to materialise and be seen

We tried the following to mitigate the challenge:

  • Making regular reporting presentations to leadership explaining the changes done and the outcomes achieved

  • Making presentations at company-wide events about the ongoing transformation and the benefits realised

Transform while you perform

The business demand for new features has been high
during the 
transformation years. Splitting development capacity
between new
feature implementation, transformation activities and operations of existing features turned out to be very challenging. The resulting pace of transformation was rather slow.

We tried the following to mitigate the challenge:

  • Providing the teams with the guidelines for capacity split

  • Providing the teams with an understanding of gains and benefits to be reaped when investing in transformation

Architectural decoupling of products

When the transformation started, all teamplay products were tightly coupled architecturally. It took a very long time to perform the necessary architectural decouplings enabling independent product releases. The prioritisation of the architectural decouplings against new features has been a major challenge.

We tried the following to mitigate the challenge:

  • Bringing architectural changes into the portfolio management, increasing visibility at the organizational level

  • Explaining to product owners the benefits of independent product releases enabled by the architectural decouplings to be prioritised

  • Educating the architects about the necessary architectural decouplings, reasons for them, benefits of performing them and the outcomes that could be achieved with them

  • Celebrating successes when a major architectural decoupling was done

  • Fostering knowledge sharing between the teams putting them in a position to better estimate the time it would take to re-architect services

Making TDD a default software development method

Although TDD is a fundamental Continuous Delivery method, making TDD a default software development method turned out to be a major challenge.
The introduction of TDD was promising.
Working alongsideconsultants experienced in TDD
was fruitful. However, as time went by,
the number of teams and individuals applying TDD dwindled. We did not manage to get TDD practiced in the ongoing programmingwork of the teams.

We tried the following to mitigate the challenge:

  • Encouraging teams to apply TDD to newly developed code

  • Encouraging teams to refactor code being worked on to prepare it for future TDD work

  • Offering hands-on code craftsmanship classes on demand

Overall, a long-term process change involving people, process, technical, organizational and regulatory changes will be challenging by definition. A strong vision and courageous leadership is required to grow and sustain the momentum over time. Showcasing and celebrating successes on the journey goes a long way in keeping the people highly motivated in order to weather the emerging challenges.

Due to a large number of options different aspects of the transformation can take, it is nearly impossible to plan the process. A viable alternative to planning is to state high level business goals for the transformation and quantify them appropriately. Then let the teams define intermediate milestones to explore their paths towards the goals.

Major transformation opportunities

Although the transformation has been a lengthy and demanding process, it held the key to major opportunities. These are listed in the table below.

 

Change

Explanation

Enable fast experimentation with business ideas

Operating in the rather new digital health market, there is little prior knowledge of product categories and specific products that would resonate with the market. The same holds true for the willingness to pay and competitive price points for products and services. In this environment, experimentation with business ideas is the central point of business activity. The ability to explore the markets using fast business experiments bears a great competitive advantage. In the healthcare domain, this translates to the product demand of roughly fortnightly to monthly releases. For medical device software products, monthly releases are feasible after initial regulatory submission as long as the intended use of the product remains unchanged. For non-medical device software products, releases every 2-4 weeks are feasible also for companies governed by heavy non-medical regulations such as ISO 9001 for example.

Change the culture of the organization towards generative

There is a high degree of collaboration between different groups and teams in the product delivery organization required to make frequent releases. This naturally leads to positive changes in organizational culture towards generative. A generative culture is performance-oriented. It is characterised by high cooperation, risk sharing, bridging encouragement, inquiry into failure, etc. These characteristics can be supported organically when a product delivery organization is transforming towards faster software releases. The generative culture was found to drive higher software delivery and organizational performance by the research from the book “Accelerate”.

Change the software delivery towards speed, stability and reliability at scale

Fast and stable software delivery necessitates efficiency in processes such as requirement engineering, development, testing, deployment, regulatory documentation generation, release and operations. Not only is it possible at the team level; rather, the efficiency can be scaled up at the organizational level. Operating efficiently maximizes the time available for the actual iterations on the products.

Speed and stability improve together

As shown in the previous chapter, speed and stability improved at the same time during the transformation. This can be seen in detail based on the speed and stability indicators in the pictures below.

The production release lead time trend over the years of transformation shows that for a long time, between 2016 and 2019, the lead time did not decrease. On the contrary, in 2019 the lead time increased despite all the transformation efforts. However, since 2020 there has been a significant drop in the lead time: 2.4x from 2019 to 2020 and, on top of that, 2.1x from 2020 to 2021. That is, there has been a compound 5x production lead time decrease since 2019!

The production deployment failure rate trend over the years of transformation is shown in the picture below. It shows that for a long time, between 2016 and 2018, the failure rate did not decrease. On the contrary, in 2018 it peaked at 100% despite all the transformation efforts. However, since 2019 there has been a steady drop in production deployment failure rate: 2x from 2018 to 2019, then 1.2x from 2019 to 2020 and then 1.7x from 2020 to 2021. That is, there has been a compound 4x production deployment failure decrease since 2018!

The two pictures above demonstrate that since 2020, the speed and stability of software delivery has been improving simultaneously. The production deployment lead time and failure rate trend downwards at the same time. That is, the software delivery is getting faster and more stable at the same time.

The following reasons led to speed and stability improving only after several years into the transformation:

  • The knowledge of Continuous Delivery methods was not there in the organization when the transformation began. It took a significant knowledge ramp-up and experimentation effort in order to build up the knowledge and arrive at a set of methods that are suitable for and accepted by the organization.
  • “Transform while you perform”: the transformation took place in parallel to a significant implementation of new customer-facing features using new technologies. This was a business necessity. The feature implementation took a significant share of development bandwidth leaving only the rest for transformation activities.
  • Technical changes such as architectural decoupling, creation of independent deployment pipelines per team and test refactorings took a long time to be put in place due to development bandwidth being shared between transformation and feature development activities.

Confirming research from the “Accelerate” book

The book “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations” by Nicole Forsgren, Jez Humble and Gene Kim presents research on software delivery based on a survey of more than 400 companies around the world. The research from the book was succinctly summarized by Randy Shoup in his talk “Moving Fast at Scale” using the two pictures below.

As part of the research, two clusters of companies were uncovered: high and low performers. In terms of speed of software delivery, high performers deploy about 10 times a day with a lead time of less than an hour. The low performers deploy about once a month with a lead time of about six months.

Interestingly, there are not a lot of companies in-between the two clusters. That is, the companies surveyed tend to be either high or low performers. Even more interesting to see is that the clusters of high and low performers based on the speed of delivery, shown in the picture above, are the same for the stability of delivery, illustrated in the picture below!

High performers nearly never fail production deployments. If they do, the recovery is brought about in less than an hour. Low performers, on the other hand, fail nearly half of production deployments. The recovery from a production deployment failure occurs in more than a day.

Due to the clusters of high and low performers being the same for speed and stability, it can be concluded that in software delivery, speed and stability go together. In fact, the faster the software delivery, the more stable it becomes.

This might sound counterintuitive. However, the rigorous research from the book proves it. During our software delivery transformation, we could see firsthand how speed and stability of delivery improved at the same time. That is, our transformation could, indeed, confirm the research findings from “Accelerate”.

Transformation retrospective

Looking retrospectively at the transformation, the following lessons can be highlighted.

  1. The transformation was not driven in a data-driven way from the beginning. Only later in the process the Continuous Delivery indicators of speed and stability were established at the organizational and team level. We can recommend establishing the indicators from the start of transformation. It can be done based on the book “Measuring Continuous Delivery” by Steve Smith detailing the definition and implementation of the speed and stability indicators for all stages of a deployment pipeline. The speed indicator is structured as production deployment lead time and production deployment frequency. The stability indicator is structured as production deployment failure and production deployment recovery time.

Having the indicators at the organizational and team level caters for a faster and deeper alignment of people in the organization regarding the goals to be achieved through the transformation. Moreover, it enables the teams to set their own intermediate speed and stability goals in a data-driven way. Most importantly, the indicators allow a proper application of the improvement kata throughout the organization formalizing continuous improvement as a general way of driving the transformation:

 

Step

Description

Application of speed and stability indicators

1

Get the direction

Expressed as long-term speed and stability goals

2

Grasp the current condition

By looking at the current speed and stability indicator values

3

Establish your next target condition

Expressed as the next set of speed and stability goals

4

Conduct experiments to get there

Make technical, process, organizational etc. changes and measure the impact of changes using the speed and stability indicators

  1. The introduction of SRE and operability in general was done in later years of transformation. We can recommend weaving these aspects earlier in the transformation process. This caters for growing the “you build, you run it” attitude, processes and tools more organically during the transformation, rather than being perceived as yet another big transformation step once Continuous Delivery has been established. Moreover, the SRE indicators of reliability can be used in the improvement kata on a regular basis.

For instance, an availability indicator can be established as service availability rate and time to re-establish availability on loss. Long-term and intermediate goals can be defined and iterated towards under the guidance of the availability indicator.

  1. In terms of working in new ways, we learned that providing people with both knowledge of new methods and coaching by the people proficient in using the methods is a powerful combination. That is, when allocating time for people to learn, just providing access to learning materials or training is not sufficient to change habits. By the same token, just providing access to coaches is not sufficient to build enough understanding for reasons to be coached. Consequently, coaching alone does not change habits either.

By contrast, a combination of initial learning and a prolonged period of application of what was learned under the guidance of an experienced coach tends to change habits at the individual and team level. The initial learning establishes an understanding of the underlying reasons for, benefits of and contexts for the application of new methods. The subsequent application of the new methods paired with an experienced coach ingrains the new methods into daily work with a frequent and powerful feedback loop adjusting the process in-the-small.

  1. The reductions of production deployment lead time are going to be made by decoupling services in terms of architecture, test, deployment, regulatory compliance, release and operations. With the growing number of independent services, a lightweight governance is required in order to centrally assert organization-wide best practices, agreements and regulations while leaving as much independence as possible to the teams.

The establishment of the governance can beneficially be done for example using an opinionated platform that codifies the most important control points. These activities should be part of the transformation rather than something to be established later once the transformation has shown signs of success. This way the necessary governance can also be grown organically, beginning with very small steps such as the naming of services and environments.

We started the governance activities late in the transformation and can recommend growing the practice as the first independent services are being built up.

Summary

Accelerating software delivery is a long-term transformation process. The process can be steered effectively using speed and stability goals, and measured using respective indicators. The research from the popular book “Accelerate” says that speed and stability go together. Following this, both measures should improve at the same time. The software delivery transformation at the Siemens Healthineers teamplay digital health platform confirmed the research from the book. During the transformation, speed and stability improved at the same time.

Acknowledgments

We would like to acknowledge the following people who were instrumental in driving the software delivery transformation at the Siemens Healthineers teamplay digital health platform: Thomas Friese, Carsten Spies, David Schottlander, Fabio Giorgi, Frank Schneider, Frank Stanischewski, Philipp Guendisch, and many others.

Furthermore, we would like to acknowledge the following Continuous Delivery consultants whose help was invaluable during the transformation: Dave Farley from Continuous Delivery Ltd as well as Neha Datt, Ryan Bayly, Marcel Britsch, Keerthana Jayaram, Louis Abel, and many others from Equal Experts Ltd.

Finally, thanks go to Akshith Rai for maintaining the Continuous Delivery indicators at teamplay.

About the Author

Dr. Vladyslav Ukis graduated in Computer Science from the University of Erlangen-Nuremberg, Germany, and later from the University of Manchester, UK. He joined Siemens Healthineers after each graduation and has been working on Software Architecture, Enterprise Architecture, Innovation Management, Private and Public Cloud Computing, Team Management, Engineering Management and Digital Transformation. Since 2018, he has been holding the software development lead role driving Continuous Delivery and SRE Transformation for the Siemens Healthineers teamplay digital health platform and applications. Since 2021, he has additionally been holding a reliability lead role for all the Siemens Healthineers Digital Health products.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT