Bio Dave is currently working in the area of high performance computing in the finance sector for Getco Ltd." Dave was an early adopter of agile development techniques, employing iterative development, CI and significant levels of automated testing on commercial projects from the early 1990s. Dave is co-author of the book "Continuous Delivery" and was part of a small team who created 'LMAX Disruptor'.
Software is Changing the World. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.
1. Hi, my name is Peter Bell and I am here at QCon New York with Dave Farley. Dave thank you for taking the time to come out. I wanted to ask you some questions about Continuous Delivery, the practice you work on and of course the book that you wrote with Jez Humble. For anyone who is new to Continuous Delivery, how would you describe it, what is it about?
I think there are several different ways of looking at it. Perhaps the most straight forward for most people that are familiar with Continuous Integration, it is kind of like the extension of continuous integration across the software development life cycle. So personally my own take on this is that – I am a popular science nerd and so I’m very keen on trying to apply that kind of rational thinking to my day job of writing software – and so for me Continuous Delivery is trying to use the scientific method, trying to achieve verifiability and trying to apply the sceptical mind to the work that I do. So I don’t want to make assumptions, I don’t want to make guesses about the way which my software is going to be robust, fulfil its functional needs and be deployed into production. I want to be able to assert those things before I release them. And so for me Continuous Delivery is trying to do that, trying to automate much of the development process as seems useful and leave human beings do the creative bits.
2. Obviously there are some things that many developers would be familiar with in moving down that road, the basic things like unit test coverage and perhaps acceptance test coverage. How do you go further, what are some of the ways that you can verify more than you get with these unit acceptance tests?
I think a lot of it it’s just not trusting ones own judgment and retaining that sceptical mind, so if there’s something that can go wrong. My recent background has been in the area of high performance finance, so we have been writing very low latency systems, so performance testing, that’s part of the pipeline too. So every commit changes what will go in, they will be validated against a series of unit tests to show that the code is doing what the developers think it should be doing. They’ll be validated against a series of acceptance tests that show that the code does what the users would like it to do and they’ll be validated against a series of performance tests that show that the performance characteristics of the code is good. Not only that but other areas, so LMAX one of the companies where I was working during the course of writing the “Continuous Delivery” book.
We did validation of what was would classically be thought as non-functional requirements, we would selectively kill bits of the application while it was running. All under the control of an automated test and validate that when those components were restarted the state of the system was coherent and consistent. So I think anything really that can go wrong, the deployments of the software into production, rehearsing that, asserting that that works before you get to the day when you push the button and it’s released. All of those things need to be verifiable. The configuration of the system. That’s one of the things that is commonly missed and treated as a separate second class citizen to the algorithms that we write. But I can break your software just as quickly, maybe more quickly by making the configuration invalid rather than changing the code.
Peter: And so it’s just taking a more holistic view, saying "What are all the ways this could break?" rather than just the simplistic ways it might break on a developer’s laptop.
Peter full's question: Now a lot of people I think confuse Continuous Delivery with continuous deployment. I’ll often get the feedback: “Well, you know we have to ship CDs to clients” so from a training perspective it doesn’t make sense for us to update on a regular basis. And yet lots of the poster children, I think for Continuous Delivery are also doing continuous deployment. Companies like Etsy. So what would you say to somebody who doesn’t want to push to production thirty times a week, it’s not consistent for them. Are there still benefits from Continuous Delivery?
Yes I think there are extensive benefits. The distinction I would make between Continuous Delivery and continuous deployment is that Continuous Delivery is developing code in a way, working in a way in which your code is always fit for purpose and ready for release. That doesn’t necessarily mean that you have to make the decision to release. That should be, can be and should be a separate business decision: weather you want to release this now or later. But at any given point you’re in the position to be able to release it, so if you’ve made a release and you find it a business critical bug in your software how long is it going to take you to safely, professionally go through all of the evaluations in order to deploy the corrections of that bug, without cutting any corners, you know? That’s an important and valuable aspect of this. Continuous deployment, the process where every commit on an automated basis will make it through into production is kind of a, probably at the bleeding edge of this stuff right now. And many big organisations are doing this you know. This is how Amazon, and Facebook and Etsy and people like that are working. And it works really well for them. But it’s not for everybody. It works, all of the examples
I’ve just mentioned happen to be Web centred companies. I was, we were building a financial exchange. We couldn’t do continuous deployment because it would affect our latency; it would low down the rate at which trades were processed during the course of switching over. We were agonising, and wondered whether we could do it, tried to think of ways of doing it but it didn’t make sense for us. Maybe we could have solved that technical problem but it wasn’t of value to the business at the time. And so I think that’s the distinction. It’s a matter of figuring out what suits the business, but working in a way that keeps the software deployable is a very, very healthy practice.
Peter: So what are some of the preconditions before you are really mature enough or ready to actually start doing Continuous Delivery, because I’ve see examples where I’m talking with a developer and “Yes, we would love to Continuous Delivery but we don’t actually write tests”.
Well I think that’s a precondition. I think Continuous Delivery, if you’re trying to make sure that your software is permanently in a position to be releasable into production then you have to evaluate that assertion some way. And the way in which we do that is by writing tests, so automated tests. You can’t afford to be doing this … this doesn’t eliminate the possibility or even perhaps the need to do manual testing. But we don’t want to use human beings for doing kind of dumb manual testing.
Humans being doing regression testing to my mind is an anti-pattern. Human beings are not very good at it, they’re not effective. It’s expensive, slow, and it doesn’t catch as many bugs as writing automatic tests. So, use the automation to supplement those things. So, I think testing is certainly a corner stone; automated testing is certainly a corner stone of this practice. I think too, if you want to really get to the point of a very viable software in the way in which I was talking about before, you need to be thinking about automating the deployment of the software because you need to be rehearsing that too. And that too can’t be a manual, a manually intensive process. But it’s kind of a journey. One of the things that I say in my presentations on this topic is that to achieve this you need to start working in a learning environment. You need, it’s not a destination in its own right it’s a journey. You’re continually improving on the process of doing this thing.
Peter's full question: For any given organisation that’s somewhere throughout that process, that’s trying to get involved, or whether they’re just starting to write unit test, or whether they’re trying to optimize their configuration and production management and deployment. How do you figure out what’s the next thing you should do for any given company? If I’m here saying I want to improve, how do I figure out what to work on next?
Well, there are, I’m not one of them, but there are several people that do some interesting work in this area. So Eric Minick from Urbancode has a maturity model which describes, and did a presentation at this conference that was very good, that kind of describes, you know, what’s the beginning state you know, what’s kind of pathological, what’s the extreme target and the steps in between. And each of those and this is for a series of different practices that you would use to categorize Continuous Delivery and a series of different number of maturity levels that you would; and within each cell in that grid there are practices that you can apply. So those sorts of things are very useful to all, for looking at what could we, where are we now, where would we like to be, what are the next steps to achieve that. So in terms of looking across the sphere of what other organisation are doing, that sort of things. So, I think that’s a good way forward.
Yes, I think that they, my experience of this is that this is, Continuous Delivery is the kind of practice that leaks out, it’s not something that only works within a development team. It changes the relationship between the development team. It’s holistic as I think you said earlier on. It changes the relationship between the development team and their user base and the business. And certainly the organisations that I’ve worked for; for the better it changes those relationships. So I think one of the huge anti-patterns is siloing in organisations. You can’t achieve really high levels of quality and verifiability in my experience by throwing things over the wall. The teams need to be working very, very closely together, communicating on a daily basis, interacting on tasks. So cross-functional teams all focused on delivering high quality software is the way to go. And there’s, since Jez and I wrote the book about Continuous Delivery it has become part of the works in the DevOps movement. And I think that’s one of those silos that probably needs to have the barriers broken down a little bit. The relationship with the business, the relationship between developers and testers is another. And I think that’s the biggest anti-pattern, that’s the toughest object, the toughest barrier to these kinds of practices in most organisations.
6. One of the things a lot of companies in this space are playing with is the idea of feature toggles, feature flags, whatever they call them in the given organisation. What do you see some of the main benefits are of that in a Continuous Delivery environment?
Yes, that’s an interesting question. There’s a fundamental tension between any kind of branching and the practice and theory of continuous integration in general and Continuous Delivery specifically. What within continuous integration, what we’re trying to do is that we’re trying to get is the earliest feedback that we can, that my changes work in the context of everybody else’s changes. Any branching whether that’s a feature toggle, whether it’s a feature branch within a repository like Git or separate branch in your subversion repository, whatever. Any branch by definition is designed to isolate change and so there’s a fundamental conflict between the idea of continuous integration and branching of any kind and you know that’s tough. It’s a tough problem to get around. My own view is that you can’t count yourself as doing continuous integration if you’re not submitting your change to the main line, trunk head, whatever you call it, at least once a day.
Peter: Got it. So I mean that raises a question, because again it’s interesting because I do training with GitHub and one of the things we talk about is feature or topic branches and I think in many ways for companies that haven’t started to use them they’re good step forward. The challenge is then once companies do start to use them they have sixteen month long feature branches and wonder why the last person to integrate at the end of the month has problems.
Peter: So, although I believe Michael Fowler made one distinction which is actually having one long running feature branch isn’t too bad, it’s when you have two of them that you generally run into integration as long as you’re rebasing.
Yes. If you keep touching head that’s better than not, but still you’re fundamentally separate. And going back to your question about feature toggles, they’re doing the same thing in a sense. But for me it’s kind of a slightly healthier way forward because you can choose your evaluation so I know different companies take different strategies so Etsy for example when they’re running their tests I believe that they run them against the production version of the feature toggle which means that those features are not being evaluated in those chains, so there’s a danger then, that you might get a big shock later on when you turn them on and they don’t integrate with the rest of the features. The other option that I’ve seen is that people will flip the feature toggles the other way so that in test they’ll be running with the as yet unreleased features turn on so that you can assert that they still work with the rest of the code but in production those features will be disabled. And again then your testing is slightly faking and environment for your production, and so it’s. I think it comes down to that there isn’t a perfect answer to this because of this fundamental tension. And so it’s a matter of judgment and it’s a matter of project priorities which mechanism works best I think.
Peter: And it seems to be like each solution becomes the new problem.
Peter: Because feature branches are great because it means you can release at any time from master but now you’ve got un-integrated work. Feature toggles means at least you’re testing that your code doesn’t have meaningful semantic conflicts and that it compiles with everyone but now you get this combinatorial explosion. I want to run my acceptance test sagainst every possible combination of feature flags.
Peter: And that becomes its own separate issue. Otherwise, there was actually somebody from Etsy speaking yesterday who said: “Yes we have hundreds of feature flags and so far it’s not been a problem”.
Yes. Clearly it works for them, so they made the right choice; I mean they’re not having problems with that and so. I think that either of these can work, a thing that is always going to be painful is long running branches that are staying away from head so you need to keep merging, whatever your branching strategy. But certainly one of the keys of that is branch as late as possible and they should live for as short a time as possible.
Peter's full question: Now something that, it was actually Stephen Hardisty from Etsy, one of their engineer managers, gave a presentation in the Lean Startup Track yesterday called “Screwing Up For Less”. And one of the interesting things he was talking about was the fact we’ve talked about testing so far which is very important, but he said that other side of the coin is measurement. At some point your tests aren’t going to catch everything so the question is how do you reduce the costs of failure? What are some of the practices or approaches people use to do that?
This is very close to my heart in terms of the appeal of the scientific method for me. I want to get to these things being very far and ultimately companies like Etsy are showing the way forward with some of this. They’re experimenting in production and evaluating at the level of business change. Is this idea better that this other idea? And so capturing data and looking at those things, I think that’s the way forward. I mean that you know, if as software developers what we’re looking for is to have an idea, get it into production, into the hands of users, evaluating the value of that, surely that’s where all we want to be and that’s a healthy place for businesses.
Peter's full question: And it feels like this is very supportive of the whole kind of Lean Startup methodology, the short build-measure-learn cycle, the idea of validating with cohort analytics and things like that. So, do you see that Continuous Delivery and Lean Startup are sympathetic or supporting approaches?
Yes, I do. I mean I think that they come from similar roots. Well, I mean if I’m, as I keep coming back. My analysis is that this is the application of the scientific method to software development. Lean manufacturing, lean processes they came from the same root, a conscious lift from "form a hypothesis, design an experiment, carry out the experiment, iterate". It was a, you know they’re all the same thing and the scientific method is the most important invention in human history. It’s the best way of solving problems; it’s the most effective way of solving problems, so it's kind of inevitable that there will be parallel evolutions of these ideas. That’s the best way forward. So I see Continuous Delivery, Lean Thinking as very well aligned in terms of approaches.
Peter's full question: And one thing that I’ve noticed as a challenge in a lot of organisations is by its very nature you’re breaking things, you’re making mistakes. You may break production; this is not going to be a surprise. Are there any things you’ve noticed that have helped to change cultures where the default environment is: “Wait, you took production down, you’re fired”?
I’m not quite sure how to answer that question. I can give you a fun war story which was at one of the companies that I‘ve worked at where these ideas have kind of become prevalent across the company. We had a new starter, in the Ops team, who went into the server room and unplugged one of the servers and caused us some problems in life. So, you know he was scared to death that he was going to get fired. As it turned around I thought the response was perfect: “We’re not going to fire you, you’ve just learned this most valuable, you’re never going to do that again are you? You’ve learned this terrific lesson”.
I think experimentation is fundamental and part of experimentation is that sometimes things are going to go wrong. The trick is to try and limit the impact of the experimentation and to do the serious dangerous experiments in safe places. So think about maybe staged roll-outs, you roll at a small proportion of your user base, that’s one technique. Or think about the sorts of high levels of testing that I’ve been talking about, carry out your experiments in a live-like but not a live environment. And it’s those sorts of things. Crossing your fingers and hoping is not a good way forward.
Peter: Well Dave, thank you so much for taking the time to speak with me. My name is Peter Bell, I’m here for InfoQ at QCon 2013. Dave, thank you very much.