Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Stephen Thair on Enterprise DevOps and Cultural Change

Stephen Thair on Enterprise DevOps and Cultural Change


1. Hello. I'm Manuel Pais and I'm here at QCon London 2015 with Stephen Thair, an expert in web performance and also known as one of the DevOps Guys. Thanks for accepting our invitation. Can you briefly introduce yourself to our readers?

My name is Stephen Thair. I come from an operations and infrastructure background. I've been doing that for about 25 years. I started the DevOpsGuys with my business partner James two years ago to help out other organizations manage their online applications, and build continuous delivery pipelines and move towards a smoother DevOps way of working.


2. Do you often see in those organizations still a clear divide between Dev and Ops teams? And is that for specific sized enterprises like medium/large or is it across the board regardless of size?

There is definitely still a clear divide in most organizations. Obviously it's more prevalent in larger scale organizations. When you're in a startup, the divide between Dev and Ops is normally the width of the desk, because Devs and Ops are on opposite sides of the desk. Far easier to collaborate in that environment. In larger enterprises, development and operations, not only they're in different buildings but they're in different countries, with different cultures and different languages. In many cases different companies as well. So aligning incentives and getting people to work in a seamless fashion in front of those challenges is a different problem but you definitely still see it a lot.


3. I'm curious, in your experience, how do you relate what you see in the field with the results in surveys like Puppet Labs' State of DevOps report? Do they match?

I really like the stats in the State of DevOps report. I think that it over reports the number of people who are truly doing DevOps. I think people say we're doing DevOps which means that I've got Chef and I've got Puppet or I've got Ansible or I'm doing some DevOps automation. I think probably the other area is not knowing the details of their statistical analysis. I think there's a chicken and the egg correlation problem there about are they doing DevOps because they're successful forward-thinking organizations with clear business objectives driven by technology enthusiasts? Or are they successful companies because they're doing DevOps?

I definitely agree that you see faster deployment rates. I definitely agree that you see significant improvement in recovery time. So it is just faster to debug a problem. It's faster to get things fixed. I think you do see cycle time improvements on getting something end-to-end through the chain. So I agree with the outcomes. I think we need to do more work on averaging that out across a wider population and saying: are we not just looking at a bunch of relatively high performing companies anyway who are high performing for a whole bunch of different reasons and that's why they're aligned with the DevOps way of working?


4. Do you agree with the idea of enterprise DevOps in the sense that enterprises have particular needs and challenges that can't be solved with DevOps practices applied in smaller companies? Or is it just that they require a longer, slightly different road map to [achieve] the same goals as smaller companies?

I think enterprise organizations definitely need a different road map. Again, this comes back to this conversation in the context of Agile, cargo cult Agile. Am I just doing that thing? I'm doing stand ups, I'm doing sprints, I have product owners and DevOps, I'm doing all these cool things with automation, I have this amazing monitoring stack with elastic search and graphite and all those sorts of things.

The road map to get to DevOps has to be the road map that works for you. If I'm in a small organization and I've only got 50 people or 100 people then the organizational challenges I face - I had a fascinating conversation with a guy leading a DevOps initiative for an investment bank and he says I've got 28,000 people in the technology organization in an organization of 120,000 employees. He has just completely different set of challenges. The goal and the objectives are still the same. He wants to create seamless ways of working. He wants to reduce waste, he wants to reduce silos, but the mechanisms that he's going to have to do, the road map that he goes and more importantly the resistance and the challenges that he faces are going to be very different. So it's still DevOps, it's just his path to get there is going to be different. And his path to get there as one enterprise might be very different to another enterprise.


5. [...] Besides calling the DevOps Guys, what would be the first step would you say?

Manuel's full question: In environments where it's very strongly siloed teams, it can be very daunting to take the first step on this road map to DevOps culture because Devs might not want to take on added responsibilities on the Ops side, and Ops people might be afraid of the automation taking away their jobs. What would you suggest as a first step to someone inside an enterprise or any other company in that kind of situation with silos and they want to improve things but there are these opposing forces? Besides calling the DevOps Guys, what would be the first step would you say?

The second step after calling the DevOps Guys obviously, in trying to address DevOps I think the first thing is that DevOps is really something that has to be bottom up and top down at the same time. If you get out there and you get some people enthusiastic in DevOps and they're doing some proof of concept (POC) stuff and it's really cool it'll tend to be technologically focused and it'll reach a certain point and then it will kind of die because it is not related to business need and business value. Conversely, if you get just senior management pushing down and saying we're now going to implement DevOps in the same way as saying we're going to be Agile from tomorrow; we're going to be DevOps from tomorrow. People are just going to have a massive resistance against that. There are frameworks for managing business change. One of those is a by a guy named Kotter, it has eight steps. Two of the steps there is get senior stakeholder involvement and make sure that you start small and plan for quick wins. So let's see how you can take a project or an environment, apply the principles to that team and demonstrate the value. Try and make something that people can touch and feel and say yeah, here I can check something in here and it can go and maybe you wanted to go as far as UAT, not all the way to production in the first instance. And here it is, into this environment and it's been deployed and it's seamless and people are working together. And you gather that feedback, but most importantly is: if I'm going to gather that feedback at the end, I've got to have gathered the metrics and stuff and be very clear on what I'm measuring and what's the improvement. That’s where the senior management stakeholder comes in. What is it that you want, you tell me the metrics that you want me to shift and then I'll get to the end of the POC and I'll say: right, I have now shifted these metrics, is this something that you want to encourage and promote and push out to the rest of the organization?


6. [...] In particular in those cases do you think it can ever make sense as a first step to build a bridge to a more DevOpsy culture?

Manuel's full question: I would like to ask whether you think a DevOps team, a dedicated team, can ever make sense? In particular, you were giving the example of that organization with thousands of employees just in the IT department and sometimes even the operations are outsourced to another company. So in particular in those cases do you think it can ever make sense as a first step to build a bridge to a more DevOpsy culture?

So in the context of: is a DevOps team an anti-pattern? I think we've written, we wrote a blog post on DevOps anti-patterns and a number of other people have said having a DevOps team is an anti-pattern. I'd agree, having a DevOps team is an anti-pattern. Having a "platform" team however, who are responsible for engineering a robust continuous delivery pipeline that your DevOps initiatives might consume, that's not necessarily an anti-pattern.

We keep coming back to these prerequisites on sort of things that you need and you need to have flexible computing resources whether that's a public or virtual private cloud, you need to have access to things like Jenkins and TeamCity for continuous integration. You need to have access to tools for infrastructure automation and configuration management with Puppet, Chef, Ansible. And whilst it's nice to sort of say that I'm going to let all the teams pick their own tools, that's difficult to sell in larger enterprise organizations.

So having a platform team that is in constant communication with all the teams saying: actually we've put together this platform, if you follow this mechanism. But the key thing is they have to be practitioners, they have to be drawn from people who are well known in the environment. They have to be continually talking to the teams, not the model that you get like in sort of, again, large investment banks where there's this architecture team that has this big foot that comes down and says "Oh, you must do it this way" because that definitely won't work. But I think this concept of a team that is there as a resource to help other people, think of them as your AWS team or your Google team. They're building the tools and the tool chain, the platform, to help you deliver your DevOps initiatives. I think that's a good pattern and maybe they're more heavily involved in your proof of concept and then they step away as times goes on.


7. Do you believe continuous delivery and DevOps are the missing pieces for Agile? To not only embrace changes but actually deliver them frequently to the customers? Is that what's missing to achieve that holy grail of business and IT alignment?

So the relationship between DevOps, Agile, and continuous delivery is really interesting. We had a really good debate in the DevOps track for those who haven't seen the session videos yet where in my slides, it was kind of continuous delivery and then DevOps was the parent. And then in Dave Farley's presentation, he sort of viewed it as continuous delivery being the end-to-end thing and DevOps being a part of that.

Are they the secret sauce? No, in the same way that stand ups and retrospectives and all the artifacts and rituals of Scrum aren't necessarily a solution to Agile. I struggle to see that you can do continuous delivery and DevOps if you aren't embracing Agile software delivery and lean principles and small batch sizes. I mean it is so focused on iterative improvement. You've got to have a small iteration size whether that's a two weeks scrum or whatever.

I think that at the same time, if I'm delivering lots of software, if I'm getting to a point where I've got this big release backlog, and my stuff is not getting out into production, into the hands of the end users, I'm not delivering business value. Working software in the hands of users has to be the end goal. And so you need Agile and continuous delivery practices and DevOps practices to get it to the end user.


8. You mentioned already that DevOps initiatives to kick start in a large organization that you need to have both bottom up and top down involvement. I was wondering when you go in an organization to help them, do you encounter some smells of things that need to be solved recurrently, things that you already can easily analyze and see it's a common problem?

Yes. Of the common problems that we face and the blockers to DevOps initiatives, there are two. The first one is the finance model. I have a project based finance model that is based on calculating a business case and return on investment that wants everything defined upfront. I've got to know how long it's going to take, what's going to be delivered and how much it's going to cost so I can calculate my ROI. In that model, it's very difficult to sort of do iterative Agile delivery; it's very difficult to do experimentation. So we urge people to try and move towards fund product teams. Don't fund projects and initiatives, say “actually we know what we're going to be doing, I know we're going to need three scrum teams or five scrum teams”. Okay so fund the five scrum teams and feed work into them.

The second blocker we see is technical debt, in particular technical debt around regression and test automation. We're saying we want to move this through really quickly, yeah, but it takes us three days to do a test cycle because we're completely manual. You can't push stuff through because otherwise what you're going to do is you're going to be pushing stuff through, it's not going to have the quality you want, it's going to break stuff in production and therefore DevOps and continuous delivery are going to get a bad name.

So finance model and technical debt, particularly technical debt around test automation.


9. In your talk here at QCon, you focused on technical and organizational elements of a successful DevOps environment. Can you give us a brief rundown of what those elements are and is there any particular order in which you would try to introduce them at an organization new to DevOps?

So in terms of the organizational challenges around DevOps, the first thing I would try and do is flatten communication. How can I get all of those communication channels in place? Whether it's something like Slack for chat or HipChat or whatever tools that you want to use there and then sort of going up through daily stand ups, weekly meetings, town halls, the more that you can have communication flowing seamlessly throughout the organization. Also we do a lot of work with automating so that when somebody checks in, it appears in the chat channel, when somebody runs a build it appears in the chat channel, when somebody deploys to production it appears in the chat channel.

Creating this awareness of what's going on, getting those communication channels. Because then once people can connect they can start to self-organize, they can start to say “oh, you're the expert on that” or “you got a real interest in that”. And that is what starts to build the grounds for change.

I think the second organizational thing we touched on one of the earlier questions is around incentives. Whilst operations is incentivized to stability and development is incentivized to change then we've initially got [a clash]. In some organizations, they've talked about swapping those around, so the operations people are incentivized to change stuff and get things into production and the development guys are incentivized to make sure about stability and not breaking stuff, to improve their code quality.

But I think understanding what it is that you want to achieve and making sure that everybody's on the same page and just because I'm automating elements of your job, to me that should be freeing you up to do the value added and the higher value stuff. There are some organizations that have been doing some really interesting things around “okay, well you don't need to do that thing anymore, so actually, we're going to start taking work from outside, from other companies, because we've now got spare capacity. We're not going to leave anybody. We're going to add more value, we're going to do more stuff.”


10. On the technical side, what kind of things would you start with?

What would I start with from a technical side for a DevOps initiative? I think if you can't use public cloud, which many organizations can't, I think trying to give developers a self-service cloud is just a good start. Make it easy for them to obtain compute resources and it might be against a template or a pattern. Okay, I want to create this and it'll run this particular application and if you don't renew the lease that whole system will shut down in 30 days, and it's billed back to your project or your team's cost center.

So flexible compute would be the first thing. Robust monitoring; in the keynote presentation we had this morning from Google on cluster management, the third of his three key take away points at the end was about robust monitoring. You have to have the monitoring in place to tell me what's going on and to measure change. So if you don't know when your application is up, what resources it's consuming, if you don't know how many developers are working on what and checking in and what's the state of the build and you don't know the metrics across the entire software development life cycle then I would make sure that I had those things in places as a key priority.


11. And what are the typical reactions you get when you introduce those changes? And are people more favorable to technical changes or cultural changes?

So in terms of the reaction, I think, being technologists, there's a real... Haven't we talked about the CALMS model? Culture, Automation, Lean, Metrics and Sharing. I see many organizations that kind of forgotten the C, L, M, and S and it's all the way technologists and it's tangible and we can touch it and we can take this automation.

I think that we, again as technologists, we're not trained in cultural change and that's why I keep saying go out to the business literature, go out to the change management literature, go and do an MBA or something and learn more about how they analyze a culture and how to implement cultural change.

But that said, in my presentation, I made a very personal point. About five to ten years ago, in my 25 year career, I was going to leave the industry because I just couldn't work like this anymore, I hated it. I was stuck in large organizations and I didn't see the value and I think that there is, even though we might not know how to do cultural change as technologists or as individuals, and we might be frustrated, I think there's a real desire to “I want to work better, I want to be a professional, I want that autonomy, mastery and purpose in my working life”.


12. Another question, more on the technical side. How important is the adoption of a common DevOps tool chain across an organization and can the tools in some situations actually have an adverse effect of increasing the silos in the sense that maybe you need people with a specific technical skill or knowledge of these tools to be able to do the job? Have you ever seen that?

Does having a specific tool chain impact? I think that, coming back to one of my previous answers around this idea of having a platform team and a shared tool chain, I think that's important to get you started.

That said, I think experimentation is also really important. If somebody says actually we're finding Jenkins really really difficult and we'd like to try TeamCity or actually I want to start experimenting with Docker. Or actually Puppet is too much overhead or Chef is too much overhead and I want to use Ansible.

The whole idea of having an experimentational culture allows that diversity. But again the difference between doing something because it's technically cool and doing something saying “this is an experiment”. Dave Farley talks a lot about this idea of the scientific method. I'm going to experiment and say “do you want to get better productivity or better delivery on these metrics, using this tool set for the lifecycle of this project?”. Then at the end I'll make a decision whether that's the right choice for us, rather than just doing something because it's cool.

Do I see silos? There are always going to be pockets of technical expertise that are difficult. I think the challenge for the culture of the company is how do I identify those experts? How do I get the fact that they are an expert widely known? What can I do to apprenticeship and transmit and have internal conferences and community of practices and birds of a feather sessions so that I can try and at least diffuse some of that knowledge out into the organization? And more importantly, am I prepared to fire somebody who's technically very good but doesn't share? And the answer to that is yes.

If they're not living up to the cultural objectives of your DevOps culture and your organizational culture and your commitment to sharing and collaboration, and they're being that person who's hoarding the knowledge, that is a HR issue, that's a management issue. It's not a technical issue, there's no technical fix for that. It's actually: you're not the right person. To be honest I don't really care if you're the only person that knows how this thing runs, go away. I'll take the hit. I'll deal with the technical problem of that technical silo.


13. I guess what you're saying is it's better to remove the constraint in the system and allow, even though it might take longer, to get a better flow over all?

That's a perfect way of thinking about it. If the constraint is a person, and we talk about bringing pain forward, well I'm going to have a lot of pain now if I get rid of that person but in the longer term I'm going to end up in a better place because I'm going to be forced to find a better way around it. So yeah, it's a good way of looking at it.


14. For our last question, you are familiar with Conway's law, right? I'm curious, since you have a lot of experience previously in web performance in your career, have you noticed any correlation between products or services with outstanding web performance and the typology of the teams that produced those products?

I'm just trying to think of examples. I think QCon London 2015 is going to become known as the QCon of Conway's law and reverse Conway. Let's reorganize our organizations so that matches how we think. Everybody's been talking about Conway's law and the reverse Conway.

I think in the Microsoft study that I referenced in my presentation one of the things that they talked about was the cohesion of the team that's working on a product and the diffusion of responsibility. How far up the chain do I have to go before I find somebody who's responsible for 75% of this delivery and what's the rate of churn that I have inside the team? So there is no doubt that I think some of the best products come from a team that is cohesive, a team that is well managed, that has a span of responsibility that enables them to do the things they need to do and has strong leadership and a commitment to technical debt.

There was a Velocity presentation on continuously delivering culture at Etsy back a few years ago by Michael Rembetsy and he used the phrase "always living in a sea of engineering filth" which is this fabulously emotive phrase that I keep coming back to. So you've got somebody who's "I'm tired of living in a sea of engineering filth, I'm mad as hell and I'm not going to take it anymore." And when you've got a team that is behind that saying “yeah we're not going to live in swill anymore, we're not students living in a squat or something, we're software professionals and operations professionals, and we're going to have that commitment” there is no doubt that having a cohesive team of the right size with the right span of control leads to better performance, both in terms of delivery and in terms of application performance.

Manuel: Okay. Well, thank you very much, Steve.

Thank you very much.

Jun 14, 2015