Key takeaways
|
In 2006, when I began to work at Scuderia Ferrari, the CIO asked us to increase the speed of software development to help Scuderia Ferrari stay ahead of competitors, and at the same time to increase reliability to avoid show stopper bugs that can jeopardise a Grand Prix victory or the title race.
In F1 indeed, software is a competitive advantage: inside the car, on the pitwall, in the home factory, during car’s development, during testing, during the preparation to a race, and during the race.
You won't be surprised to know that software legends such as Neil Konzen (one of Microsoft's earliest employees, head of Microsoft's Macintosh programs projects, and creator of the second version of Windows) or a former member of the famous Connextra team (one of the first, most prolific and longest running team adopting eXtreme Programming) such as Paolo Polce, worked in F1.
Public domain picture from Wikimedia Commons
Formula 1 is the pinnacle of motorsports- races are broadcasted live worldwide to one of the largest global audiences, and it’s a multibillion-dollar business.
The F1 test and race calendar is pretty dense, with many official events occurring almost every week, and between the end of a F1 season and the beginning of the next one, frenetic activities continue for the development of the new car.
During 2006, and the following years, it wasn’t uncommon for us to work simultaneously for a test event, a race event at a different track, and car development at the home factory.
As our CIO used to say, the race starts at one o’clock, the deadline cannot be postponed. Good features finished on time turn into competitive advantage, unfinished work doesn’t, and defects turn into hindrance for the team and the drivers.
Reality checks mandated by the F1 calendar are inescapable, there’s no extra time to hide mistakes, waste, or inefficiencies: after each race, it’s already time to start working for the next race.
The positive pressure from this challenge, the talent of people involved, the frequent reality checks mandate by the F1 calendar, and a bit of luck, have led us to pioneer continuous delivery practices since 2006.
Two of these novel practices were strictly related to coding. They emerged during 2006-2007, and still I’ve never seen them documented so far. So I decided to document them in this article, and to ask you if you have seen them emerge in other teams, in what forms and with what names. Here they are:
- Latent-to-live code pattern: a gradual transition of latent code to live code, that provides early feedback from production on features under development, while keeping the software and the code-base always in a releasable state.
- Forward compatible interim versions: a way to deal with complex changes that break backward compatibility, without giving up remediation plans such as automated deploy rollbacks.
These two novel practices were built on top of two other novel practices both essential in continuous delivery: trunk-based-development and feature toggles. While these two latter practices are well known and documented nowadays, they were almost unknown at the time.
Short descriptions of latent code patterns and remediation plans are available in the free Continuous Delivery Overview booklet published by InfoQ.
Trunk-based-development & feature toggles overview
Teams adopting continuous delivery frequently check-in potentially releasable code to the mainline (e.g. trunk or master) as part of the practice called trunk-based-development. Paul Hammant, former principal consultant at ThoughtWorks, describes in great detail trunk-based-development in What is Trunk Based Development?
A special type of feature toggles, called release toggle, enable teams to check-in, integrate, and verify code frequently, before a feature is complete, and without exposing half-completed features in production. This leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.
Martin Fowler has described Feature toggles and release toggles.
In this example the team has to enhance the behaviour of an existing feature that computes tyre degradation used by a race simulation application. Tyre degradation here is intended as loss of performance caused by tyre’s wear, and measured in time difference between an ideal lap time with new tyres and lap time with tyres used for a certain amount of laps. The existing feature computes tyre degradation using only 3 variables. Here is an example:
public double CalculateTyreDegradationDeltaTime(lapNumber,
trackId,
compound)
{
/* Here goes the implementation of the tyre degradation */
/* computation with 3 variables. */
}
To enhance the feature, developers have to add a new variable to the calculation, add the variable also to the user interface so a user can view and change it, and add the variable to the data storage as well so that all 4 variables of each tyre’s degradation can be stored. This is an example of the new function with the new variable that is the difference between the optimum operating temperature for the tyre and the current temperature:
public double CalculateTyreDegradationDeltaTime(lapNumber,
trackId,
compound,
temperatureDelta)
{
/* Here goes the enhanced computation that */
/* takes into account also the temperature. */
}
A feature toggle in the front-end is used to hide the additional variable from the user interface until the enhanced feature is completed.
In the back-end, the behaviour based on the 4 variables calculation is temporarily added side-by-side to the 3 variables calculation.
Meanwhile the front-end is still wired to the 3 variables version of the calculation.
The following picture visualises this scenario, with the arrow visualising the wiring:
Incomplete code changes are visualised in the picture above with the dashed line rectangles.
At this point the incomplete code changes, made to enhance the existing feature, are latent code that may be shipped to production, but they won't show up and won't be executed because of the feature toggle.
The latent code at this point can be executed only in the development and test environments by automatic tests. While the race simulation application remains in a releasable state.
We gradually moved toward feature toggles and away from branching and merging as we found that it was faster, safer, and more flexible.
Feature toggles also enabled us to make incomplete features selectively available to the requesters so they could access the feature and help us to validate it. At the same time, we moved toward trunk-based-development, which enabled us to reduce the support to only one version and made it easier to have a unique reliable and repeatable way to release software changes even in case of emergencies. It also enabled us to always keep the race simulation application code-base in a releasable state, giving us the ability to react quickly and safely to unexpected requests coming from the test track, the race track or the home factory.
This article, by ThoughtWorks, further describes the benefits of trunk-based-development: Enabling Trunk Based Development with Deployment Pipelines.
Before feature toggles and trunk-based-development, I remember Adriano, one of the team’s most experienced and sr software engineers, wearing a hachimaki headband during emergency branching and merging, and the whole team swarming around Andrea, the other team’s most experienced and sr software engineer, dealing with risky ad-hoc activities for emergency releases. After adopting trunk-based-development and feature toggles, those scary moments were gone.
Public domain picture from Wikimedia Commons
Latent-to-live code pattern
Latent-to-live code pattern is the process of gradually putting latent code in use in production, before the related feature, or feature changes, are made available to the users, while keeping it invisible from the user and at the same time collecting valuable learnings from the execution of the new or changed code.
The principle of latent-to-live code pattern is to learn faster, creating and collecting valuable feedback from production sooner and more often.
The example below describes one way the code of a feature enhancement, invisible to the user, was released and executed in production, providing evidences about its functioning. It is useful to demonstrate the concept in practice. Please bear in mind that there are many ways to implement a latent-to-live code pattern; it’s up to you to find the simplest way that works for the feature you are implementing or enhancing.
In this example, the behaviour of the existing calculation with the 3 variables mentioned before (lapNumber, trakId, compound) is a special case of a calculation with 4 variables where the value of the 4th variable (temperatureDelta) is set equal to zero. So when the implementation of the 4 variables calculation is completed, the existing implementation of the 3 variables calculation can be replaced with a call to the 4 variables calculation passing zero as 4th argument. Here is an example:
public double CalculateTyreDegradationDeltaTime(lapNumber,
trackId,
compound)
{
return CalculateTyreDegradationDeltaTime(lapNumber, trackId,
compound, 0);
}
public double CalculateTyreDegradationDeltaTime(lapNumber,
trackId,
compound,
temperatureDelta)
{
/* Here goes the enhanced computation that */
/* takes into account also the temperature. */
}
This scenario is described in the following picture. The arrows show the front-end calling the existing 3 variables calculation in the back-end, which in turn is calling the new 4 variables calculation:
All the automated tests targeting the original feature now exercise, at least in part, the new enhanced calculation with 4 variables.
Once the code is shipped into production, the incomplete feature enhancement is not exposed in the front-end, thanks to the feature toggle, while the 4 variables calculation is executed live every time the original feature invokes the 3 variables version of CalculateTyreDegradationDeltaTime.
From running the new calculation in production we can learn sooner if new code works as expected for all the cases where the 4th variable is set to zero, and if it works without breaking existing features.
Forward compatible interim versions
At this point, in our example the implementation of 4 variables calculation in the back-end is completed. The addition of the 4th variable in front-end is also completed, but it remains invisible in production thanks to the feature toggle.
The front-end is still wired to the old 3 variables calculation. On the storage side, only 3 variables are currently persisted and retrieved for each tyre. This is shown in the following picture, which also shows the number of the current version, v13.
In this example, the remaining change to be made to the data store (could be in the form of adding a column or replacing a table) breaks backward compatibility because the previous versions of the software won't work with the updated data store. Furthermore, reverting the changes in the data storage would be very difficult, if not impossible, and time consuming, so it’s not practicable.
As consequence, once the data store in production is updated, it won't be possible to rollback to the previous version of the software in case a show-stopper bug pops up.
This would leave the team without a viable remediation plan for this update.
While schema-less data storages can make things simpler, every time there are two moving parts and a change that breaks backward compatibility, the team faces a similar challenge.
Think for example at a scenario where a client-server protocol used by many clients needs to be changed breaking backward compatibility, or a change to a data store that affects large volumes of data and all the code accessing the data store.
Here is where forward compatible interim versions come in handy, they are intermediate versions (v14 and v15 in this example) that enable the team to cross the bridge of a breaking change while still remaining able to go back if it’s needed.
You can read on Wikipedia about Forward compatibility, while the free InfoQ booklet mentioned before also describes forward compatible interim versions.
In our example, version v14 can work with the original data store and also with the new one that can store the 4th variable. In v14 version the tyre degradation feature can detect automatically the presence or the absence of the updated data storage:
The automatic detection can be implemented, for example, looking up the version of the data store in use, or trying to access the new data store and falling back to access the old one in case of failure. The result of this detection, like a feature toggle, does enable or disable the new functionality in the front-end too.
This version of the software, v14, is released into production with the original version of the data store. And it’s left in production long enough to ensure that it’s a stable version without any show-stopper bug. If something goes wrong, it’s still possible to rollback to the previous stable version v13 with just one click.
The next step is to release version v15 that primarily includes the new updated data store, note that the old store currently remains available side-by-side (could be in the form of an old column or an old table). Note also that in some cases it could be that software version v14 and v15 are identical, and the only change lies in the deploy package that includes the update for the new data store.
At this point it’s still possible, as remediation plan, to rollback to v14 if there’s a show-stopper bug in v15, or it’s possible to remove the new updated data store if there’s a problem there.
When v15 is in production long enough to ensure that it’s a stable version without show-stopper bugs, the old data store can be removed. While v14 remain a valid rollback option. Here is the picture of v15:
In the following update, version v16, the code that enables the feature to work with both the old and new data store and related feature toggles can be removed; we have crossed the bridge (we are up and running with the new data store) and we don’t need to cross it back (to use the old data store or to run an old version of the software such as v13 that needs the old data store) anymore.
The forward compatible interim versions in this example, as mentioned before, are indeed v14 and v15.
At this point we were able to manage fast and frequent releases and compatibility breaking changes with confidence, in a reliable and repeatable way, without dramas, and without the need to trade speed for quality or safety.
Conclusions
These techniques emerged during 2006-2007 while I was working for Scuderia Ferrari.
During these same years other teams around the world, unaware of each other, were autonomously inventing and pioneering continuous delivery and the related techniques.
In 2006 ThoughtWorks published their first post mentioning continuous delivery, and in 2009 they released their book on the subject.
In the introduction I mentioned that two of the three ingredients that led us to this discovery have been a positive pressure and a bit of luck. This is what positive pressure and luck looked like for us:
- The constraint on the number of races and test events where all software is used in a real production environment pushed us to maximise learnings at every GP and test event.
- A bug in the merge feature of the source code repository led us to invent our version of trunk-based development.
- For security reasons, we had to periodically introduce breaking changes which in turn caused pain to our remediation plans, this lead as to invent Forward compatible interim versions technique that then it has proved to be extremely useful for all backward compatibility breaking changes.
These techniques enabled us to go faster and safer at the same time, and benefits were immediately visible to us.
Further reading:
- Continuous Delivery book, Jez Humble
- First ever post/article about Continuous Delivery, ThoughtWorks
- State of DevOps report 2016, Puppet
- From Continuous Integration to Continuous Delivery and DevOps, slides, Luca Minudel
- Software development in Formula One, slides, Luca Minudel
About the Author
Luca Minudel is a Lean-Agile Coach & Trainer with 14 years of experience in Lean/Agile and 20+ in professional software delivery.
He is founder and CEO at SmHarter.com, a company that helps organisations turn their way of working into their competitive advantage.
He contributed to the adoption of lean and agile practices by Ferrari's F1 racing team. For ThoughtWorks he delivered training, coaching, assessments and organisational transformations in top-tier organisations in Europe and the United States. He worked as Head of Agility in 4Finance. Luca is passionate about agility, lean, complexity science, and collaboration.