Bio Dave is currently working in the area of high performance computing in the finance sector. Dave was an early adopter of agile development techniques, employing iterative development, CI and significant levels of automated testing on commercial projects from the early 1990s. Dave is co-author of the book "Continuous Delivery" and was part of a small team who created 'LMAX Disruptor'.
Software is Changing the World. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.
Hi. I’m here speaking at the conference and meeting a lot of old friends. Workwise I am working for a trading company and trying to help them improve their software development processes. We are looking to implement continuous delivery in an organization, which is an interesting challenge, dealing with the culture changes around the technology.
I guess a lot depends on what you mean by DevOps enabled, I think fundamentally the answer is probably no. Continuous delivery is a lot about being a holistic process and breaking down barriers and silos in organizations and fundamentally what I believe is at the root of continuous delivery and other Agile practices, really, is the establishment of feedback loops to evaluate change and react to it. That means that the person that causes a breakage needs to see the breakage and have the freedom to fix the problem, so DevOps allows that. If you’ve got a silo operating between development and operations, then the developer is going to write code that breaks things, the operations are the people that are going to feel the pain, so I think there is a degree to which the DevOps is probably a prerequisite to really do this well.
I think it is, I think it’s less good, I guess I would say that. For me DevOps really means this thing that the developers being close enough to the operations to see the pain, there is much more to it than that, there is infrastructure, there is code and certainly as you get to the more sophisticated end it starts not to make an awful lot of sense stand-alone without the more upstream practices that continuous delivery talks about. The DevOps operations is just one of those interfaces within an organization that you need to get right. I think that certainly if it’s small teams and smaller, simpler projects certainly you can operate in a DevOps mode without necessarily having the high levels of automation that continuous delivery implies. I think the output is probably not going to be as good if you do that, but certainly you can gain benefit from those sorts of practices without going all out.
4. Both continuous delivery and DevOps put very strong emphasis on non-functional requirements, let’s say, but in many organizations business requirements gets all the attention. So, what are some common approaches to deal with this issue?
First, one of the key principles of continuous delivery, which I admit is a conscious lift from Lean thinking is that you build quality in, so that implies writing code that is going to be maintainable and operable and fulfills those non-functional requirements, it’s my strong belief that if you do that, over time it pays back in spades, it’s a much more efficient way of working. It doesn’t feel like that, so it’s one of the real tensions that tends to happen in teams, even capable teams where the business wants feature and the technologies want to address some technical debt, there’s tension between those things. I think an awful lot of it is about building a trust on both sides that the business will trust the technologists that if they say they have to do this, they really have and so the technologists get to focus on delivering more elegant solutions. Honestly, I don’t really have any silver bullets that make that an easy decision, it’s an awful lot down to the experience, the skills of the people involved and the relationships that you build.
Certainly in the organizations that I’ve worked in where the continuous delivery process was mature, what happens over time is that we build up that trust and we build up that trust by sometimes compromising either way, so sometimes the business doesn’t get what they want, when they want it, sometimes the technologists don’t get to do the features that they want, when they want to, but we have an open conversation and we recognize that each of them are compromising when we are compromising. That’s the human things that help to build the trust and it worked quite well; certainly for my last company, LMAX, where we did that over time it got to the stage where, I was part of the technology team, if we wanted to concentrate on something and we said we did, the business would just say “ok, do it”. Occasionally, if it was a really big thing, they might want some justification, they might just want a conversation about where is the value in this, why are we doing this, but they quickly learnt that when we did those sorts of things, we usually got good result afterwards.
5. On a slightly different topic, on the development side, methodologies are a source of debate and discussion, but on the operations side it seems it’s not so important. Why do you think this happens?
I think that depends on where the operations are, there are some fairly formal methods for operations in some organizations, but I think a lot of it is just the nature of the situation that people find themselves in, in organizations that aren’t working very well. So developers, it’s a creative discipline, they are looking to try and find ways and explore, developers are problem solvers, in bad organizations or dysfunctional organizations, the operations team are often in a fairly unpleasant place, they are getting substandard quality pushed over the wall to them and they are in crisis of firefighting all of the time. There is a degree to which they don’t have much opportunity to lift their heads over the parapet and think more strategically, work more strategically, they are too busy trying to limit the amount of change, so they defend the stability they have. So, I think there is a degree of that. What I can say is the people who work in those environments are generally pretty smart, they are technologists too, and when they start seeing new ways of working that work, they are keen on adopting them, so I don’t think they are anti-methodology or anything like that, I think very often they are busy stuck in these already unpleasant situations where they are just fighting fires all day.
6. I think I can relate to that. What methodology is most amenable to continuous delivery, methodologies that follow timebox iterations, they seem at first to be less amenable than flow-based ones, what is your view?
I think either works. Fundamentally, I believe that Agile methods, in general, work because they are the application of the scientific method to software development, and what I mean by that is that we consciously establish feedback loops at different resolutions so that we can learn. Very tight resolution at writing test and seeing the results and iterating on that and in the course of resolution we are having ideas and getting those in the hands of our users and seeing the effect of that and iterating on that and that process of feedback, which to my mind is a kind of pop science nerd, is fundamental to the way effective processes work, is really at the root of what needs to happen. Now, the way in which you establish those feedback loops matters less, so you can either do it by having fixed points in time, an iterative structure every couple of weeks or something like that and that has some benefits, particularly when people are learning these sorts of skills and these sorts of approaches, not some benefits in terms of setting up a regular cycle, you can say we are going to have a kick off meeting and have showcases, and people understand that. As you get more advanced at these sorts of things, the flow is what’s really important, but the iteration is fundamental.
Most of the teams I’ve worked on, not all, but most of the teams I’ve worked on, we generally started off with iterations and kept them, but we kept them for human reasons, they give a nice cycle. The danger with a full-on flow model from my own perspective, I’ve worked on one of the relatively early Lean teams when I worked at ThoughtWorks, it worked brilliantly, but I believe that the team suffered from fatigue more, sooner. With an iterative structure you tend to get a cycle and you’re pushing for a deadline to make sure you got your work done by the end of the iteration and it’s kind of a dynamic, and you relax and everybody is feeling good about their success of that iteration, we can pat ourselves on the back because we did well and we can move on and have our retrospectives, and that kind of cycle works nicely. When you are full-on Lean where the iteration is story-centered, which they should be, but they tend not to line up, so one pair might be working on one story and they finish and they can celebrate, but the other team didn’t finish and they fail, there is a danger that you have to find other ways to get the team done. My favorite method is that I have an iterative structure, but we work closely, we can release it anytime, you are always working, practicing to release, you can release your features throughout the iteration, but we keep the iterative structure just for the human thing, so we can have cake at the kick off meeting as a team and all of those sorts of things.
7. It’s more common nowadays to have geographically distributed teams and sometimes the operations team is maybe on a different continent and on a different company. Do you think it’s possible to do continuous delivery in this scenario and how would you go about bringing it into action?
I absolutely think it’s possible to do it in this scenario, but it has its extra challenges. I think software development is a human creative activity and it’s a technically difficult human creative activity that we get together and do in teams and so an awful lot about doing that well is about the team dynamics and managing that, so making sure that at the human level people can interact effectively and so technology can help a lot with that. So, I work very closely with some guys at my company who are based in Chicago for example and we have on our desks video conference systems so we can overhear what they are talking about and interrupt one another. So that sort of things help, but fundamentally what really is absolutely essential is that periodically we’ll fly and go and stay with them for a bit and eat and have cake together and do all those human things, we make a bond so that we trust one another more. I think that is absolutely fundamental whatever the nature of the team, but it’s doubly important when you are talking about places where there is a room for tension. So, if you’re outsourcing and there is another company or as you say, there is developments in one place and ops in another, I think it’s risky, all of this stuff is much more easy if everybody is sitting together and they are all doing it in one place, but practically that doesn’t always happen. But you can absolutely do it; you just need to be a bit thoughtful about making sure the communication at the human level works. The technology is the easy part, the technology is rarely an issue in my experience.
8. In heavily regulated industries like finance, for instance, there is the concept of segregation of duties. How can you keep a healthy continuous delivery practice with that kind of concepts, because at times they may seem a bit inimical to open collaboration?
Yes, I think you are right, I think that’s challenging, in part it’s establishing a healthy relationship with your regulator. So, I work in the finance industry and in my current company, we are dealing with software that was written before continuous delivery, so we are looking to migrate that stuff. The last company I worked was a company called LMAX, that was a startup, we built a financial exchange from scratch and went through the process of getting regulated by the regulator here in the UK, and so heavily regulated industry, important to get right all that sort of thing. I think a lot of this is about just being open and honest about where the problems are, so the Chinese wall thing, the segregation between production and development is difficult, it’s a difficult barrier. What we did at LMAX to manage that problem was… first, there was absolutely no nomad, no transfer of live data from production into the development environment, except when it went through a process that anonymized it. So there was no access to information that the development team shouldn’t have, but we could have the structure of the data, so we could test it. We wrote scripts to do these transfers when we needed that kind of thing.
In terms of the individuals and the access, parts of this is aligned with continuous delivery, if you get this perfectly right, even LMAX which was a very CD-focused organization doing it from the ground up, we didn’t always get it perfectly right. But in the perfect world, ideally, nobody touches production systems, no person needs to go tinker with them, so nobody has access to them because that’s the job of automation. So, if you’re in that world, if all of your changes flow through an automated deployment pipeline, get tested and so on, it doesn’t really matter very much because the only things that have access to production are the robots that deploy the software. Of course, occasionally, there are problems, at LMAX we tried to straddle the problem, so we ended up, and I am not suggesting this is the perfect solution, but it worked for us for a while, was we ended up having a nominated DevOps team which people would rotate through and while they were doing that they didn’t have access to some of the resources and they did have access to the production and developers would rotate through that team. It’s not at all perfect, but at least it makes sure developers have the experience or see the kinds of problems, the classes of problems that are happening live even if they don’t see their problems, that they create. So it’s not true DevOps and many people that talk about DevOps will say that having a DevOps team is an anti-pattern and I would probably agree with them. But it kind of worked for us in terms of regulating this environment, give that slight barrier between, so they had more access than the development team when they were on the DevOps team and then they lost it when they rotated back into the development team.
9. On to a more technical topic. Good automated integration testing is or can be a very hard topic, especially when you have to introduce it mid-project and when the application talks with lots of different services, lots of different applications. Do you think it’s always worth the effort to bring in automation?
The simple answer to that is yes. It’s not always worth the effort tough to back fill the level of automation you would ideally like, in my opinion. If you’re coming into a legacy system, for somebody that is as test-obsessed as me, the lure is to say “we’re not going to do any work for six months or twelve months, we are just going to write tests to protect this piece of software”. That doesn’t make sense, it doesn’t make commercial sense. A strategy that I’ve use many times in dealing with legacy systems in that kind of mode is just talk to the business, identify where the valuable use cases are through the system and just write tests that defend those, arms-length, functional tests, blackbox tests of the system that will just assert the principal use cases. From then on, all new work is done full-on, test driven development, acceptance test driven development, all of those good things and so we quickly grow a suite of tests to get a good coverage in the areas where we do new work and we’ve got these defensive tests that will assert that we haven’t broken the principal behaviors of the system by interacting with it. Even that is an investment, but that level of investment is certainly worthwhile.
I think this is a big topic, that probably isn’t covered well enough. My own leanings and what I have done for projects in the last few years, I believe very strongly that in expressing the tests themselves, the test cases should be expressed in the language of the problem domain. That implies some kind of domain specific language that you create to isolate, to be able to separate the concerns of expressing the test, which is in business terminology, from the way in which the test system interacts with the code and the tests. Way too often, and most commercial test frameworks that I’ve seen do this, they conflate those ideas so you express the tests and you express how you interact with the system at the same time, so you do neither very well, business users can’t really read what the test is. The test is going to brake all the time because you conflated these concerns, they are fragile, and so I think it’s really important to invest some good design principles into design the test cases. We had some wonderful experiences with this strategy at LMAX.
Things like at one point, early in the life, we went from text based typical trading blotter style user interface to pointing click graph game for ordering and all of our tests kept running as we made that transition because the tests where expressed in terms of place and order, what’s the state of an order book, those sorts of things. Those concepts still made perfect sense when you typed a number into a spreadsheet like thing or clicking on a graph. The concepts are still right, and that’s where the level of abstraction is important. I am a big fan of whole system, blackbox automated tests using a DSL that are kind of glued into the requirements process. So you have stories that describe acceptance criteria and it’s our definition of done from a development point of view that every acceptance criteria in the story has to have at least one automated test associated with it and we’ll review that is part of the showcase and the story and the process that quickly builds up a very thorough test case which is exercising every behavior of your system, because every behavior of your system has a story that tells you to implement it.
There are other strategies for micro services and so on. I think that often there are efficient ways of doing that and integration and test, I tend to be a little wary of it myself, because as I said before, it’s trying to get to the scientific method and trying to make sure that my software is very viable and how do I know other than guessing that the dependency between this service and this other service isn’t something that is going to mess me up if I get it wrong. If I just make an assumption, I can’t really know, if I exercise it that may tell me something different. However, I am ever the pragmatist, so if I came across a project where it made sense to test this small thing, maybe a trade-off between a whole system deployment and the feedback cycle being too long or you’ve got too many cases of testing where you’ve got variants in the different numbers of components, I’d probably compromise, but I would feel uneasy about it.
I can’t honestly say that I do, part of my difficulty I guess is that I have been doing this for a long time and so there were no tools that were targeted at this sort of thing. There are some now, but often the projects that I’ve worked on have predate the existence of these tools and once you’re working and you’ve got something that works, it doesn’t matter. In general, I think that the mindset and the approaches and the process are much more important than tools, the tools are fairly easy. For example, talking about DSL I was describing at LMAX, we just wrote our own from scratch, it wasn’t as complicated as it sounds, it was a very simple DSL, but it’s more about the design thinking, it’s more about the design philosophy that matters rather than the tools themselves. I’ve heard good things of things like Cucumber, Twist and SpecFlow, tools like that. I confess I haven’t used them myself on real projects, I can’t profess their value personally.
12. On branching, and I know you have some strong opinions on branching. So, the continuous delivery emphasizes a need to develop on the trunk as much as possible, on the other hand, tools like Git, they make it really easy to do branching, people seem to like this possibility, so have your thoughts changed on this? What do you think about this situation?
No, I’m afraid they haven’t changed, I am a hard lined continuous integration guy. Fundamentally continuous integration is about trying to expose integration risks, we want to evaluate our changes alongside everybody else’s changes as early as we can, we want to see what things your colleagues introduce that screws up the work you are doing or you introduce that screws up the work they are doing. We want to surface that so that we can react to those quickly and fix those problems quickly. Any branching strategy of any kind, even feature branching which I use and branching by abstraction, even those things are designed to hide change and so they are absolutely antithetical to continuous integration. So, I guess I am saying they are a bad thing, I was going to say they aren’t wholly evil or anything and occasionally I would use branching and so on, but it makes me uneasy if my code isn’t regularly being evaluated and in the context of other people’s, I’m at risk. It’s not about the tooling, no matter how good the tooling of merging things together are. If you and I are writing on separate branches, the tools can be wonderful at merging things together, but functionally we could write code that interacts in bad ways and not spot it, it’s nothing to do with the branching, with the merging tools. It’s just to do with design, it’s just to do with surfacing, your head is going in this direction, my head is going in that direction and that stuff doesn’t work together anymore. Continuous integration is the only way that I know to get over that problem, it’s not perfect, it has its drawbacks, but for me as a strong CI guy, it’s worth not doing feature branching to gain that benefit.
João: About feature toggles, I got the impression you are not too keen on them, also.
Not really. I use feature toggles, I think they are an important tool, an important technique, but again, it’s a problem. If you write some code and you’ve got a feature toggle for that code, what do you do when you run in test, you run the test with the feature toggle turned on and so you’re testing your new features or are you working with the feature toggle turned off, so you’re testing what’s going to go into production? Or do you really want both of those answers because you want to see how your features are evolving and whether they are passing the test and you want to test what is going into production, you might screw something up and might break things in production, so which do you do? What most people do, few people run both, but then you’ve got problems to evaluate the results, the right conditions, it’s more expensive, writing lots of acceptance tests certainly is very expensive. If you want to get your cycle time down to minutes, you invest heavily in hardware, now you’ve got to double up how much hardware you’ve got to run all these stuff, it can get really expensive, really quickly if you’ve got multiple paths to do, some people do, but if you don’t do that, you’ve got to make a choice, one or the other, because do you want to see what’s effectively changed or you want to see what production is going to happen. There is no ideal answer because it’s a form of branching, because it’s a form of isolating change and continuous integration is all about not isolating change. So, in the ideal world you want to write your software in a way that evolves in public, live, all of the time and there is no need for feature toggles, but sometimes that’s not practical.
13. In your talk, where you described the deployment pipeline that you used at LMAX, it was clear that you’ve put a very strong emphasis on an artifact repository. So, what are the main features that you look on that kind of tool?
The artifact repository is a central idea because the core idea of continuous delivery is every change that you submit to go into production is giving birth to a release candidate and you want to evaluate that release candidate through its lifecycle, sorry, through its release cycle and see whether it can prove that it’s not fit to make it into production. If we’re doing that, what we want is that the thing that is going to end up in production to be the thing that we are evaluating and so we want to make sure that the thing that we are creating is the thing that we are going to test. So the artifact repository comes central, so the first stage in our pipeline is generally called the commit stage, it’s going to build the deployment artifacts that we are going to release into production.
The rest of the system, the rest of the evaluations that follow on whether automated or manual, decorate that release candidate with metadata that describes how it’s done, whether it’s passed these tests and so on. So, fundamentally what we are looking for in artifact repository are probably three behaviors. One of them is that we can version any binary artifacts, pretty basic stuff, we can have different versions of the same stuff because we are going to be creating lots of these, because we are running tests and generating artifacts all the time. Secondly we want to be able to decorate those versions with metadata that describe the results of our testing so we can capture the results and make decisions on an automated basis based on this metadata that we collected: yes, it’s passed the acceptance test, yes, it’s passed commit test, yes, it’s passed the performance test and so on. And lastly, because this stuff tends to get expensive, it’s quite handy to have purge policies, so one of the things with continuous delivery is that if anything fails we, the Lean talk is stop the line.
What this means in release terms is if something fails the test it’s useless, we can throw it away, we can discard that release candidate from the artifact repository, so we can manage the disk set. The first time that we built with continuous delivery pipeline, on any projects that I am aware of, we were naïve and we just used subversion, which was our version control system of choice, and we kind of blew up the disk in about seven days because we generate a lot of stuff when we check in for and so on, but I think that’s all you need. Again, in the past we’ve written our own, you can do this kind of thing simply by writing a few scripts around an area of disk and naming directories and little xml files maybe to capture your metadata or something like that, that works fine. I wrote some stuff like that and built some Ant targets and you can just drive it from Ant, a few years ago. These days, on my current project we use Artifactory, the open source artifact repository, which has a nice property of also being a YUM repository, an RPM repository so you can integrate into other things.
Every time I talk to anybody about this stuff, the thing I try to bang on about is I think there is a step change happening. We spent decades writing software not very well, from a process point of view, if you look at the data for that, it’s hard to fault. I think we know how to do it now, I think our industry is learning, I think continuous delivery, amongst other practices, are a better way, and there are lots of companies that are seeing this and are making money out of doing things better. Etsy and Netflix and Google and people like that, they know about these stuff, they use this to significant advantage. It’s my very strong belief, because this is based on fundamentally, on scientific method and trying to get for this and establish feedback loops and so on. I think software development in ten, twenty, fifty, a hundred years’ time will be more the shape of a continuous delivery process than it will the form of a waterfall process. It will be iterative because science works better than anything else at solving hard problems and that’s how you do science, you iterate. So, it will be iterative, it will be experimental, it probably won’t be called continuous delivery, it may look a little bit different, but you will have those core ideas fundamental to the way it works because that’s a better way of building software.
João: Thank you so much, Dave.