InfoQ Homepage Presentations DevOps & Lean Thinking Panel

DevOps & Lean Thinking Panel

Bookmarks

View Presentation

Speed:

Download

49:20

Summary

The panelists confront deep questions like "How do you DevOps right?" and "Is testing waste?" Find pointers about selecting incident commanders, DevOps under auditing constraints, and low-overhead deploy coordination.

Bio

Jessica Kerr is Polyglot Functional Developer at Atomist. Matt Stratton is DevOps Advocate at Pagerduty. Bridget Kromhout is Principal Cloud Developer Advocate at Microsoft. J. Paul Reed is Build/Release Engineering, DevOps, and Human Factors Consultant. Greg Burrell is Sr SRE at Netflix, member of the Edge Developer Productivity Team. Holly Allen is Service Engineering at SlackHQ.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Kerr: Thank you, everyone for coming to the DevOps & Lean Thinking Panel. I'm Jessica Kerr. I work at Atomus and I will have the panelists introduce themselves a little bit. My open question to you is who are you? What are you doing here? And please name at least one thing that you have strong opinions about. Holly, I will start with you.

About the Panelists

Allen: I'm Holly Allen. I work in Slack. And I have been in pure software Dev roles, DevOps roles. Currently, I'm in the service engineering team at Slack, which another place might be called the operations team. And a thing that I have a strong opinion about is that language and technology shaming is completely toxic to our whole industry. And you should not give anyone a hard time for liking or using or being an expert or wanting to use any particular piece of technology because nothing's better than anything else. We're all just trying to get some work done.

Burrell: My name is Greg Burrell. I work at Netflix. I just finished my talk upstairs. I've been in Netflix for over 13 years so I've seen a lot of change there. A lot of things happening. It's been exciting. It's been an evolution. Something I feel really strongly about is company culture. It's very important. It doesn't matter how many brilliant developers you have, or what great tools you have, if you don't have the company culture to support them, it's just not going to work out.

Reed: My name is J. Paul Reed I work for myself as a consultant. I've been doing that for about six years. The job Jessica said I was doing, building release engineering, before was called DevOps. I have a lot of strong opinions about releasing. Any X build release engineers in the room at all? No? Yes, it's a dying breed.

Kerr: Is there such a thing as X? I mean, is there an exit?

Reed: Well, yes, no, there's not. You could hit the nightmares alone. So I also just finished my masters in human factors and system safety. So this discussion of language and company culture is also something I have thoughts and feels on.

Stratton: I'm Matt Stratton. I advocate at PagerDuty. I focus on human ops. So psychology of on call, all that kind of good stuff. I've spent most of my career as a system administrator. So then most of the time running infrastructure.

Reed: And System Admins.

Stratton: Yes, any Sys Admins, SIS Ops even, if we want to really date ourselves back in the day. I think I have very strong opinions on this. I was going to make a joke about ramen or time beef but now I feel like that would make me look like an ass. So what I really do have strong opinions on though is burnout. And preventing that and recognizing it and understanding how we can work as a team to prevent burnout in our organizations.

Kromhout: Hi, I'm Bridget Kromhout and I work on the cloud advocacy team at Microsoft. And I think what I have the strongest opinion about, and this is kind of harkening back to Holly's, is this idea that when I tell people I work at Microsoft, they're just like, "But you don't Windows." So I'm like, "Nope. And you know what? Having assumptions is, well, we all know who it makes an ass out of, and it's probably you."

Kerr: Let's start with audience questions. I know we have at least three.

How to Do DevOps right

Man 1: I'm Christophe, director of engineering, emerging technologies. I have a wonder about transforming organizations into more DevOps enabled, I guess. The problem I'm seeing is that, and this in our company too, in order to become more DevOps aware, we just hire more people with the title DevOps, and that's supposedly the magic that's going to make this happen. So, any opinions of how to truly do DevOps right, without really just pretending like you're doing DevOps?

Kromhout: I see Matty laughing and I want to hear what you think about this because I actually podcast with Matty and he named the podcast "Arrested DevOps." So he gets to tell us what is DevOps-ing?

Stratton: So how do you do DevOps right? And the first thing I want to point out to you - so you said hiring more people with the title DevOps engineer, and I've decided to stop tilting at that windmill about calling people DevOps engineers, mostly because I think the statistic I heard recently is that DevOps engineers make 30% more than system engineers. So go get that money, right? I mean, if changing your title is going to give you more money, then that's great.

Kromhout: As long as nobody expects you to be hired to do all the DevOps-ing. And what they mean is, you are going to do all the collaborating and you're going to do everything that enables the org, but no one else is going to work with you on it.

Kerr: How does one person collaborate?

Stratton: Exactly. It's one hand clapping. It's very zen.

Kerr: And it looks like a duck and it's not very useful.

Stratton: The thing though, if you're familiar with the term “code smell”, which I imagine, all the conference that could be at QCon. Probably people have heard of code smells before. I usually refer to if you have a DevOps team, or sometimes calling folks DevOps engineers is a DevOps smell. And what that means is that there's not necessarily anything inherently wrong with it. But what it usually means is you think DevOps means automation, full stop. So if your DevOps engineers, all they do is write chef cookbooks and write answerable playbooks and build out infrastructure automation or deployment automation, you're not DevOps-ing, you're automating, right? Which is great, that's part of it.

But I think when you're DevOps-ing correctly, again, it's about the collaboration, but really, it's about just aligning goals. Nicole and Jess talked about that this morning, which is when we think about looking at the macro goal versus the micro goal. And that's really step one; is to get an understanding. I always tell folks, and then I'll let Paul jump in here, do you know how your company makes money? And if you don't, go find out, we'll wait. Because if you can't answer that question, then you can't make any of these other macro decisions and you're going to be looking at decisions that are very focused upon your particular role.

Reed: I was just going to point out, we are in the DevOps & Lean Thinking track. We all, I think, have been around long enough. DevOps-ing or doing the DevOps is kind of a joke, but people use that language. But the point is that they don't think about- a lot of the concepts are underpinned by lean. There's some Theory of Constraints thinking, there's some large theories about how work is done that's backed up by some research that has been turned into applying that research to IT.

So if you don't take a look at that and you will allow things, for instance whatever people are doing that work, the power dynamics are such that one group is better than another group. And that's traditionally been a problem in IT. If you don't solve those problems, and you think we are just going to hire more DevOps people, then that's a major hurdle, right? You're not going to understand why ConMon matters if you don't understand pull systems and flow of work. And those feedback loops and that kind of stuff.

Stratton: That part, I just want to echo that point about treating one group better than the other. We had an episode of our podcast where if you know what's happening, you can almost hear through the computer our other cohost telling me to be quiet because our guest was saying, "Well, the great thing about DevOps is it saves time for my most valuable resources, the software developers." And I was like, "Oh, if you were not a client." So yes, I think it's a level playing field. Well, it should be. It has to be herbal or else it will be tilted.

Kerr: Yes, I would argue that DevOps is socio-technical system. I mean, all of software development is a socio-technical system and DevOps is particular way of making that system more effective, but you have to change both socially and technically. And that doesn't mean adding a person; it means really changing the interactions at the social level as well as technical. Oh, and were you in Barron's talk today? So there's one speaker in this track who's not on this panel and it was Baron Schwartz. And he talked about bringing DevOps to the database, but it was way more about how to get that transformation and also how to screw it up. So you might want to check out that video as well.

Kromhout: I want to jump in really quickly and point out that I think there's a lot of social pressure to definitely do the DevOps right. And you've got to finish your DevOps transformation by the end of Q3 so it can be in the Q3 results, and it's a journey, all right? Sorry, did this die again. I think this one might be dying. All right, you can't just say, "We need to DevOps for Q3," and that'll be in the results and everybody is going to get promoted and level up or whatever. That's not really effective. It's effective to think about it as a journey of continuing to improve. So to your initial question, just to wrap it back to your question, yes, I mean, hire people, maybe retask people who exist, and then just keep iterating because it's not like any reasonable organization isn't going to say, "Huh, we finished improving. We never have to improve again." Like, really?

Stratton: Everybody knows that you finished DevOps in Q2.

Reed: So I would point out, though, to my earlier point, all of that comes from Toyota Production System. That's why Toyota won in the '80s and everybody else was thinking, "Oh, we'll take what Toyota did and we'll convert lost in the '80s". And so again, that's where that historical context about why we do the things where we suggest you do the things in the DevOps-ing are couched in that historical context and for historical reasons.

Incident Management

Man 2: Hi, I'm Jason from Sterling Bank. And I have a question about incident management. So we follow and largely use the PagerDuty incident management processes.

Stratton: I do that as if I wrote them.

Man 2: So my question is around, we have, when an incident starts up, something goes wrong. People say, "Right, this is an incident." It's a sev one or sev two because a major piece of functionality has disappeared. Let's declare an incident. We open an incident channel in Slack, people go in there, and someone says, "Right. Well, it's a major incident because it's a sev one. So we have to have an incident commander." Someone declared themselves and incident commander and then immediately starts throwing to diagnose the problem and read through logs and so on. And I'm sitting there going, "No way, maybe you're the incident commander. You should be stepping back and letting someone else do the diagnosis. Why have you jumped in there?" And then the question is, I suppose, how do we get better at that instantaneous role segregation? We're stepping into incident mode now, how do we make sure that people adopt roles and then follow those roles through, rather than trying to do two things at once?

Allen: Do you do retros on your incident process?

Man 2: Not till now. We do retros on the incidents, but not on the incident process.

Allen: I highly recommend you bring the exact same kind of thinking over to your processes and ask yourself, "Well, how did we handle that incident? Well, what could we do better next time? How did the incident commander do?" Provide like an opportunity and a safe space for incident commanders to coach each other because it's a hard job and give each other that feedback, right? Do the same thing for your post-mortems. All these process wretches are really, really helpful.

Kerr: I like that you had an incident in your incident so you need to retro your retro.

Stratton: The other thing too is we used to at PagerDuty, used to have the kind of electing of incident commanders, which if you've ever used raft election, or things like that, we know that election is hard. So we've actually stopped doing it. So we have a separate rotation for incident command. And there are very few engineers in the incident command rotation. If there are, they're usually leads or managers. That helps very much.

And for people who aren't as familiar with the incident command system, it's not necessarily just from software. But the analogy I always use is what Firefighters say. So in firefighting, the incident commander wears a white helmet, and they have a saying, which says, "If you see somebody in a white helmet pick up a wrench, take the wrench away from them, and hit them in the head with it.” Because you're not supposed to be diagnosing the problem. You're supposed to be making decisions.

And this is one of the problems with the self-selecting is, first of all, someone is declaring themselves incident commander. They're not necessarily the best person to do that because you don't know. That also means they have to be able to do the hard thing which is realize when they need to stop being the incident commander. Something that happened at PagerDuty, and I recommend this, this was now about a year and a half, almost two years ago. One of our product folks decided, she said, "Well, why can't I be an incident commander?" Because up till then they had all been engineers. So Rachel Burn was our first non-engineer incident commander, and now we have 30 people in Incident Command rotation that are not engineers.

And it makes a pretty big difference. So that's another thing I would look at. So part of it I could not agree more with you should be doing that, doing the retrospective on the incident process to see where that's happening. But some immediate tips to start thinking about is being able to have a separate rotation specific for Incident Command. Make it short, because incident commanders get called more than subject matter experts usually.

And then another thing about the retrospective, something we started doing that I think is really cool, is we have something called RetroDuty, because we call everything something “duty” and PagerDuty because of what we do. And what you can do is you can go into this Slack channel and for anything, and you can just basically say, "I need someone to lead a retrospective on X." And just something, something you did. And there's a group of people who are good at that will come and lead retros for you and with you. So being able to have somebody lead a retro that did not participate is helpful, and it's hard to do with incidents because ...

Kerr: I like that the person, instead of saying, "I have a problem with X," you say, "Can we reflect on this?"

Burrell: I'd like to add one thing, and that is you really shouldn't separate out your incident process from the review. You know, part of reviewing that incident is saying, "All right, we know what went wrong, we know how to fix it. We know what to do better next time. But how did we do? How did we do as a team? How did the incident commander act? How did everyone respond to that incident commander?" You know, oftentimes we see responding to an incident is much about technology and tooling as it is about the people involved and how they interact with each other.

Reed: So just to tack on the last part of that, if you are having a problem with your incident response process, it is surprising how good a first order approximation of what to do is to actually look at what the fire department does. And so a couple things that I've seen that are common is there's an incident and team swarm. Like the fire department, when you call them does not swarm on the fire, right? They don't have a discussion in front of your house, "Well, it's burning, who's the incident commander?"

So the thing is, and when I work with customers and clients on this, it's like, first order approximation, it's not going to be perfect, but walk yourself through what the fire department does when you call and you'll get a good answer. So that's a thing you can exercise, you can do with your teams, if you're not getting the outcomes you want, or the behaviors you want. Just kind of table topic because everybody sort of knows what the fire department does.

Stratton: At least they know what they don't do.

The Security Model

Man 3: My name is [inaudible 00:17:33] from Autotask. This question is for Greg. And I like the security model you talk about. And basically, your trust to the developer, you gave the access to the production, allow them to do the deployment quicker and faster. But is this model question by the auditor? How do you pass the auditing requirement?

Burrell: In my talk, I mentioned that this model doesn't really work for everybody and it doesn't work for every team. And so that's exactly a case where this model won't really work. On some teams, particularly teams that deal with credit card processing or things like that, you can't give unrestricted access to all your developers, unrestricted access to all databases and production systems. So, you can do things to keep in compliance with your auditors, but the full 100% full cycle developer model that may just not work there.

Continuous Deployment

Kerr: A question for the audience and the panel, does anyone work at a place where you do something pretty close to continuous deployment, and you do have that kind of audit level where you need an extra check?

Man 4: Yes, we do deliver software and development tool, and we'll deliver it every day, a new version of it. But that is, we call that a publication that we publish it, that we release it every two months. And the difference in between is that we test, yes, we test manually and we have manual checks in the middle where we really analyze if that is correct to release. So there is that point in the middle.

Kerr: So there's a delay on that one for people to do extra checks before it goes out. Automated tests can't cover everything.

Reed: There's a document called the DevOps audit defense toolkit, and you can Google it. It was written by Gene Kim and bunch of people got together to write it. And so it basically talks about strategy, if you're in a highly regulated environment that you can use to match with continuous delivery that work and that actually make auditors happier. The dad T. That's a dad joke. The dad T joke.

Kromhout: And maybe something to consider there too, is if you have processes that are highly manual, and every couple of weeks, the change control review board sits down and decides what to do, what not to do. Keep in mind that if you have a different process for “emergency changes”, but first of all bad actors like me, pretty sure the statute of limitations is expired on that job, many years ago, I would just mark everything emergency. And that way, I didn't have to wait for the change control review board. So just keep in mind that a lot of those processes aren't going to keep you as safe as just that process that you use when everything is on fire and everything's broken, and it's okay, it's good enough then. Maybe just add more automation and safety already that so that you can use that one all the time.

Allen: So Slack has customer data in our databases. And one of the things we do while we have continuous deployment is still only a limited group of people are allowed to actually SSH those machines. But we have a break glass command so that you could do that in an emergency and then you have to log why and that logs everything you do, and then you leave. And so that also helps us.

Is Testing Waste?

Man 5: In in your responses to the opening question, I would say you offered strong opinions, but not controversial ones. So I'm going to try and stir up a little controversy here. Paul, you mentioned the theoretical basis behind lean and DevOps, and one of the things that comes from that is the wastes of lean. So there are seven in the original definition, the eighth one about rediscovering lost knowledge has been added recently. And so I would like to ask is testing waste, and if so, what do we do about it?

Stratton: I don't if there any testers in the audience before that. Hold on. Are there any testers in the audience?

Kerr: They're outnumbered. It's okay.

Reed: It's not waste. Next question. So the one thing - I've been thinking a lot about this, and there's been a shift in every industry and I think in ours, to classify certain types of work as waste. I think release engineering is one of those. I think QA is probably one of those. I think operations is probably one of those. And they are things that, from a business perspective, I just assumed it works. I assume that as a developer, you all write your software 200% correctly, so I don't need to test it because it's got quality built in. It releases itself because I can't charge for it if I can't release it. So that magically happens somehow.

And then there was a shift in operations when we started doing internet and all of that where a lot of operations folks became release engineers. And they also have to operate the system. I mean, IT, Ops departments used to run the Exchange Server. And they got shifted to running the company product, right, if we go back far enough.

So the thing is that I think it's a thing that our industry likes to tell itself that these things don't matter. And then I always giggle because a lot of times developers will say, "Oh, I want to deploy. I want to own quality. I want to own whatever." And they're like, "Okay, great. That's cool. Here is the pager." "Oh, I don't want the pager. I don't want that. I just want to be able to deploy when I want." I was like, "Okay, well, there are tradeoffs there." So I think you could put a lot of things in with the testing bucket about is it waste? You're looking at me like you have an answer too which we should talk about later. But, yes, I think that's a pattern. And that's all I had to say.

Kromhout: I just wanted to add to what Jay Paul is saying that I really appreciate the way I charity majors of Honeycomb and they are a sponsor here, you should check them out. But what she says about how nines don't matter if the users aren't happy. Of course, I mean, the testing doesn't matter as long as everything goes perfectly and …

Kerr: Or if nobody uses it.

Kromhout: Right. Exactly. So like it's very easy to imagine a perfect happy path where you don't need anything, you don't need security because everything is just written securely. It's like, "Okay, you know what, if any of you are working at a place where everything goes perfectly all the time, you definitely should be posting on the #QConSF hashtag and telling people your hiring, but I'm guessing that that's not realistic." And so we do need people to own and care about these areas, even if their job titles and job functions might change a little over time, but they probably are, maybe they're not racking and stacking as much anymore. Maybe they're writing a bunch of Tera form instead. But they probably do still need to care about that stuff.

Reed: One thing I'll point out, too, is that I've argued this both ways, which is to say I worked with a lot of colleagues and release engineering where their major skill was saying, "No, you can't deploy that." And those people aren't going to have a job that much longer." Sorry, that's the way the world is also moving. So all of those people that have deep operational knowledge, deep QA knowledge, it's not like you don't matter, it's the way you interact with the system does have to change to support the changing missions of the organization when you're writing and deploying and operating software.

Stratton: One of my favorite quotes from Paul here is, "Oh, you're a full stack engineer. When was the last time you wrote a device driver?" That meaning that this idea that we can know all of the things and be able to have deep domain knowledge around building software and operating software and testing software. Yes, we can do all those things and I think it's great to be T-shaped and securing that, but domain knowledge super matters. And to quote the great Ron Swanson, "Never half-ass two things, whole ass one thing." So be really good. Know where you're at.

Continuous Integration and Delivery

Man 6: I've attended several talks today on DevOps and Muslim touched on continuous integration and delivery. Most of them seem or all of them were from large, successful companies, and seemed like the pattern was none of them started with a well-oiled machine. And then in some process of their success they realized, "Oh, we need something more as we're scaling." Imagine you’re a startup today and what you care about is establishing a product market fit and getting a product out, which ones are these principles do you apply that very beginning? How much do you invest in that continuous integration and delivery at that point, when you don't even know that your product isn't even going to attract anyone?

Kromhout: I'm thinking probably from Slack, you definitely have opinions there. I will say that before I worked at Microsoft, I worked at a couple of startups. And honestly, everything that we did that didn't move towards our core mission was probably a waste of time. So I can talk to one mistake we made at one startup, was we looked at all of the SaaS providers that did logging and monitoring stuff and said, "That looks really expensive. I can stand up an elk stack in a day." And when that elk stack paged me a lot. So when we look at the cost to do something, anything you can get away with not doing whether you can use SaaS, or whether you can use Serverless or whatever, you're just moving the bits that don't get you to your product market fit into somebody else's hands. And maybe someday, you do need to build out your own stuff. I really want to hear what Holly has to say about this, just because I know Slack has gone through a bunch of iterations in that area.

Allen: Yes. No, I completely agree. When you're starting, I mean, and always you need to be able to enable yourself to push out changes as quickly as possible. I think that setting it up so that it's easy to push changes so that you can get to product market fit fast is actually part of the critical path. But I completely agree that if you can get away with outsourcing that to a SaaS provider or something, then go for it because it's not in here critically.

Kerr: And at a startup, you can. At a startup, you don't have to deploy to these legacy systems. You can choose an architecture that works on Horoku. You can you can fit into existing ecosystems that will give you that for free or for money, but it'll be free in attention.

Stratton: The thing that I was just going to say, the philosophies and theories though don't go away. So that was sort of what Greg was saying, okay, we talked about investing in continuous integration, continuous delivery, philosophically, yes. Because even more so because you need to be able to get that quick feedback as quickly as you can. But again, if we're going to quote Cherry all the time "The best tool is a tool you don't need. The second best one is the SaaS tool".

Reed: So let me ask you a question, how many engineers in the startup you're talking about?

Man 6: A team of maybe a dozen people.

Reed: A dozen people? Okay. So here's the thing, the argument that I would make is you can't do continuous delivery, unless you're doing continuous integration. And if you've got a team of 12 people and they're all kind of working in the same space, continuous integration can be, "Hey, Sally, I can't build. Hey, Bob, I can't build, fix it." Or maybe the code base is small enough that I can fix any part of the code base because the product is small enough.

And that's actually okay. Jez Humble tells a great story about I think he called the CI on a dime a day or something, where they had a rubber chicken, and the rubber chicken went around. And that was that was the CI lock. And that worked for some amount of people. And at some point, you will get to a point in time where that doesn't work anymore, and then you'll have to look at some of the patterns that large established companies- and there's a lot of domain knowledge there that you can use to deploy that. But you can do CI very manually. And if that works for your team, where it's you in a garage and you can see everybody, cool.

Kromhout: Like even the gov.uk people use the Badger of deploy for a while. And it was an adorable little stuffed badger that they just had sitting above whoever's cubicle was doing a deploy.

Burrell: One thing I'd like to add is it's easy to look at companies like Netflix or Slack and say, "Oh, they've got it all figured out.” I'm hoping in 10 years, we look back and say, "Well, we thought we had it all figured out and it was definitely better than when we first started but now it's even better.” And so it's that continuous improvement, you know, "How can we do it better?"

The other thing I'd like to point out is, it's tempting to say, "All right, well, these big companies have it figured out, let's do exactly what they do," right? And a lot of what we do may just be overkill for your needs. It may be too much.

Reed: I was just going to say I love the last slide of your talk. You have to tell them what it is. They didn't know.

Allen: You mean, the part about the combine? Oh, yes, the questions. Actually, shout out to Jessica. She actually gave me that. And we were talking about my talk. And she's like, "Oh," and then she summarized it that way.

Kromhout: Holly, say what your last slide was.

Allen: My last slide was "Learn faster." But the one before that was "Copy the questions, not the answers." Ask what's going to work for you.

Kromhout: And I think that that's a really good point because so often we look at the architectures that organizations built starting a while ago with different constraints and different sets of options out there. If it is 2014 and you're deciding to do some big data, you're probably using EMR on AWS. And that's probably one of the only choices out there. In 2018, you have other choices, various cloud providers, various SaaS options, and maybe you've decided that the hype cycle is such that the next thing you're trying to do is something completely different.

And so trying to set up the things that someone did, remember, they were operating with specific constraints, solving a specific problem at a specific moment in time with the staff and the available cloud and SaaS offerings that existed. And so it doesn't make any sense at all to try to replicate that.

Continuous Profiling

Man 7: Hi, my name is Ravi. This question is for Greg. So as DevOps sometimes I may not be having complete idea of what my application is doing at the time of incident. So let's say if I had the profiles collected at that time, like continuous profiling. So we may be having an idea like what the problem is during that incident time. So do you have any such tools for Netflix which does continuous profiling and initiations to do better such tooling?

Burrell: I wish we had good tools for continuous profiling. I wish I could tell you, just go out and get this and that will solve all your problems. No, unfortunately, we don't have tools like that. This is part of the ongoing problem. We need to identify these things and work with our tool teams. One of the problems we've had in the past is our tool teams often delivered tools that they wanted to write, tools that were funded right, tools that looked really cool, tools that maybe they could turn into an open source project, but not the tools that solved our problems.

Kromhout: Are you saying resume-driven development?

Burrell: What I'm saying is we're trying to turn this around and drive it from the bottom up; what are the needs and then let's build the tools to fit those needs, not the other way around.

Cultural Shift in Operations

Man 8: I work at Crunchyroll, and a lot of talks that I've heard are about developer empowerment and getting the power to the developer. And also I'd like to hear from the panel, as big organizations go to a transition, there are issues which probably need a culture shift within the organization, where operations may be having the access and the permissions, and they will feel disempowered to give it away. And that needs both a cultural shift in operations and across the organization. How does that come about? How can anyone bring about those changes and how to go about that?

Kromhout: You just talked about doing that at Slack? Do you want to give us a quick summary?

Allen: Yes, absolutely. Anytime you change someone's job, they're going to have some feelings, right? So on the one hand, giving developers more work to do. If you haven't probably staffed up yet, they've got some feelings about that. At the same time, as you said, people might feel like, "Well, what's my job now if they're taking care of that stuff?" So it's all messy because it's all humans and people have feelings. But I think that what I've seen work really well is that you do the solution setting all together, right? For one thing, it takes time. You're not just going to on Monday, "Okay on Monday, all the Devs are taking care of production now."

So there's time for everyone to think about what is it that is going to be the specialty of your group? Now, you would expect leadership to also be setting some kind of vision as well, and including everybody in that vision setting. And that also can change and evolve. So you could think about, for example, a subset of the old operations team became the cloud engineering team. And they really focused on just creating the best cloud infrastructure and platform for Devs to use. It would be really opinionated and very easy to use, so that Devs wouldn't have to go all the way down to the bottom layer. So the best advice I have is just to include people in the process continuously, so that they have an empowerment throughout. But Netflix went through the same thing.

Burrell: Yes, we went through a lot of this as well. And I think that's really good advice to include people in the process, don't set it up as “us versus them” situation. People are really going to dig in and protect their domain, protect the realm. Really, it's about what are the common problems we have and how can we all solve them? And this may mean that one particular team no longer looks the same, but maybe it looks a little different, does a little different to solve a particular need.

Reed: And to that point, it's interesting, you said developer empowerment, the question of, "Do we empower developers and how do we do that?" is different than, "What is the power in the system and what are the power dynamics?" Empowerment is not the same as power. So the point is when we talk about, okay, who has the ability to do that? Sometimes it could be that operation still does that but they are acting in a way to empower the developers to do their jobs better. So sometimes that's the case, sometimes not. My point is those concepts are related but they're not the same.

Kromhout: And creating and running a self-service platform doesn't mean everyone just yellows whatever they want out into production, because keep in mind, say a development team is like, "Cool, I want to put my logs in a totally different place that doesn't feed into our central tooling at all." And then to the point of what you were talking about earlier, with observe ability, it's like, actually, we probably do need some ability to let anyone else who's trying to troubleshoot problems throughout the system to see where your stuff is going. So there probably will be some platform team level decisions that aren't necessarily imposed, so much as they are a common shared service.

Transformation from a Team of Generalists to a Service-Oriented Ownership Model

Man 9: So as a company that's just ready to start this transformation from a monolithic to a distributed application, from a team of generalists to a service-oriented ownership model, to a team where the operations group is a completely different silo and very antagonistic at times, or at least it seems that way. The comment earlier, your second to last slide about you copy the questions, not the answers. Is there a question that you wish you would have asked first, after having gone through this transformation a number of times? And that could be any of you. There are a lot of questions and answering them the right way might actually make a difference or save some problems.

Stratton: I think asking the question of the other group, the groups that you're trying to work with, there's two ways to phrase this; there's the cynical and the optimistic. So depending upon the personality, you can imagine which way you take it when you're talking to Ops people, says the Ops guy.

Kromhout: I resent that remark.

Stratton: The cynical way is how does your life suck, right? And the other though - and it's what I used to do when I would lead organization transformations but I wish the transformation I had gone through had been asked in this way- which is, "I have a magic IT one, what's your one wish?" And you'll learn a lot about those other groups by what they wish for. But what's the thing that matters? Because the point of both of these is to be able to provide empathy to that group. To understand because we make a lot of assumptions as software engineers who make a lot of assumptions about what Ops folks do, because of where our interface point is to them.

But there's a whole bunch of stuff that happens on the other side of that "software contract" that's inside that black box that we need to be empathetic to, we need to understand. Likewise as, SREs or TechOps or whatnot, we have a lot of opinions about software engineers, but it's mostly based upon where our interface points happen. It's not about the things they do in their day-to-day. So any questions that help generate the ability to have that empathy are going to help you drive to common solutions faster.

Allen: Yes, the question I was thinking of before you started talking is exactly in that line, which is, "What are you afraid of?" Some people are afraid of not having a place in this new world, some people are afraid of having to learn a whole new skill sets, it varies, right? But again, with that empathy, because it is all about collaboration and so reaching across and getting rid of the idea that there are these silos. Because the whole idea is that we're trying to, frankly, usually make the company more money and move faster and have everyone generally have a pleasant working experience. So how do we actually get there together instead of with our sliced agendas?

Stratten: The fun thing is a lot of times we want to avoid these empathy type questions and stuff because we don't want to solve these squishy problems. But the fun thing is that once you ask these questions, they lead to technical challenges oftentimes, they lead to solving puzzles. So we get there.

Kromhout: So what you're saying is if somebody really wants to Kuber some netes, then what they should do is find out why the other people in the org don't want them to, or what the other people in the org would prefer to be doing. Instead of just saying, "I want my thing," and it's like, "Great now, other people are going to argue with you because they don't like you for some reason. Find out what do they want and why."

Burrell: I would just like to add, in my opening question, I call it, the thing that I feel strongly about is culture. And this is sort of a cultural thing. We often set up different groups almost in competition with each other, maybe the developers look down on the testers, the testers resent developers, the operations people resent developers. Well, everyone resents developers.

Stratton: "Ops has no dog to kick," I always say.

Burrell: In a sense, the culture can set up these groups in competition with each other. And really, when you when you start to break it down and say, "Aren't we all trying to solve the same problem here? What problem are you trying to solve that's different from ours? What are your concerns, your fears, your pain points? How can we better work together to solve those commonly?" So that that's a way to create a culture of really working together instead of against each other.

Reed: I'm reminded of Matt's earlier answer where it's like, "Do you know what the business does to make money?" And there's a stronger version of that question if you're talking about a transformation. It's sort of like, "What are we doing here?" That sounds very existential. But what I mean by that is, "Okay, why are you doing DevOps transformation?" Because a lot of organizations are like, “Well I read it in the back seat of my first class Sky Delta Magazine, and it said, ‘Do DevOps’". Yes, I got it. Yes, they are, like in Forbes and stuff.

So the point is, is that where that's relevant is in a lot of transformations. You see the office space mid-level manager who's like, "I'm the person that takes the plans from the customer - that's my job." And they can't see any of that vision because that's their job. And they don't know really the direction that the organization wants to move in and any of the underpinnings of any of that. And sometimes too, it turns out the organization doesn't really either, they're just like, "I read we got to do DevOps. So that's our Q2 DevOps plan." So it's a good question to ask even though it's a little existential.

Kromhout: Well, let's be honest, I think about Babylon 5 a lot and I think about Morton's question. There's a bad guy on this TV show from the '90s, who goes around to all the good guys and says, "What do you want?" And listens to their answer. And most of them are just kind of like, "What?" And one of them kind of betrays his own inner struggle and corruption and, wow, you can find out a lot by just giving an open-ended question of, "What do you want?" And see what people say and they might surprise you.

Stratton: What do you want, Bridget?

Kromhout: Well, I do want all of you folks who are interested in Kubernetes to come to my talk because my talk is going to be super fun. It'll be in the time slot after this. I will almost certainly be awake for it. I just got back yesterday from eight days in Europe and I don't know why I thought speaking in the last time slot in Pacific Time would be a good idea. But it will either be entertaining or I might be asleep during it. We'll see.

Measure Integration and Linking to the Result of the Company

Man 10: How to measure integration between Devs and Ops if focus in continuous improvement and link this measure to the results of the company? Measure integration and linking to the result of the company?

Kerr: So how do we measure integration and link it to the results of the company?

Man 10: Yes, with data. Link to the results of the company. For example, in the stock and, etc., in the marketing to improve the business, etc. Measure relationship between Dev and Ops, of course.

Reed: So a couple things. Nicole has done some work in this area. I don't know where she is with this because we were talking about it but linking actual stock performance of companies to people that have gone through DevOps transformations. That said, my immediate question when you ask that question is, "Why is that important to you?" Because you've people are going to tell me what you're measuring and I'll tell you how I'm going to act.

Humans are great incentive finding machines and if you tell people, "Well, you get a bonus if you do the DevOps thing," they will find a way to game the hell out of that, right? And so my question is, I'm sure there's an implicit reason you're asking that, it would be interesting to dig into why that's the metric, integration between Dev and Ops and why does that have to be tied to stock performance? It would be an interesting question to dig into why that's relevant in the system that you're interested in. I didn't mean that to be a mic drop or anything.

Conclusions

Allen: Learn faster.

Burrell: Pay attention to your tooling and your company culture.

Reed: Go out there and Dev the Ops and Ops the Dev.

Stratton: Take time to meet with colleagues and people in person. And my little pro tip that we've learned is make plans to do so, don't just say that, "We should co-work sometime."

Kromhout: Just because something is on the front page of "Hacker News" does not mean you need to yellow it out into production immediately. Everyone at your organization who wants to do that, might not have your organization's best intentions and needs at heart. So just keep an eye out for resume-driven development and it might be you. Do some soul searching.

Kerr: That comes up to your CIO too.

Kromhout: Think about why you want to do something.

See more presentations with transcripts

Recorded at:

Jan 10, 2019

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?