BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Kate Heddleston on Improving the Usability of Ops Tools to Improve Company Culture

Kate Heddleston on Improving the Usability of Ops Tools to Improve Company Culture

Bookmarks
   

1. [...]Kate, who are you?

Werner's full question: We are here at Craft Conf 2016 in Budapest. I'm sitting here with Kate Heddleston. Kate, who are you?

That's a great question. I'm not sure if 28 years is long enough to really have a good answer but I'm a software engineer from San Francisco. I build web applications. I work mostly at or with very early stage startups.

   

2. [...]Can you tell us what that means? I mean ops is opposed to be arcane and cryptic, right?

Werner's full question: So you gave a talk here on Usable Ops. Can you tell us what that means? I mean ops is opposed to be arcane and cryptic, right?

Yes. The idea behind the talk is that kind of all of the internal automation tooling that we build around infrastructure and deployments and DevOps should be usable enough that every single person on your engineering team can actually access your system. They can deploy their own code, they can roll back their own code and they basically have a safe way to go through the process of releasing their own code out into the wild. And so that's the idea behind Usable Ops is kind of making Ops and infrastructure management so usable that everyone can use it.

   

3. Can you give us some examples of how to make Ops usable? Some more specific examples?

Yes. So one of the examples that I often give is having kind of one click deploy or one click deploy is the same as one command deploy. The idea that an engineer should be able to trigger a deploy of any set of code out to any environment with just a single command and that everything behind it should be so well automated that it can handle kind of rolling the code out to many different servers, it can handle failovers, it does all of the logging and exposes information to the engineers do they know exactly what's happening when they've triggered a deploy.

But at the end of the day, the actual human usable endpoint is just a single command or a single click of a button so that hugely reduces human error and it also really helps people understand exactly how they can interact with the system because pretty much everyone knows how to click a button or issue a single command to trigger a deploy.

   

4. Is it just about deploying? Is it also about monitoring or other systems?

Yes. So one of the other examples I give is creating a visible system. At a lot of companies, the infrastructure architecture is held in the heads of a few people and there's usually a whiteboarding session that happens with new engineers when they come on board and someone draws some boxes and some arrows on a whiteboard and they're like this is how our architecture works but really, all of that information should be stored in a database and used for all of your automation tools and you should expose information about this system to engineers on the team.

So they should be able to come in and see what services exist, what code is running on them, where those services are running on actual resources and how they connect to other services. You can also link that up to deploying so I could go in, I could see a service running in a specific environment for example many companies have a staging environment where they put out code before they deploy to production so I could go into a staging environment in a specific service and I could actually trigger a deploy to that service.

And then you can have it hooked up to all the monitoring so you could see “All right, we're monitoring these types of things and if I trigger a deploy, does monitoring stay the same? Does something happen? Do I get more 500 errors?” And then being able to roll back as a result of something happening with monitoring. It's just a really great way for people to interact with the system. So they can see it at a high level, they can understand how it works, and then they can go in and interact with it in the ways that they need to interact with it which for the average application developer is mostly around deploying and rolling back that code.

Werner: So listening to that, it sounds a lot like basically the universal command to developers to document their code but it's not documentation but more of a life visualization of the code in a way.

Yes. It's like a live documentation of your system. So your system, you might have several different code bases running in different places and it gets really hard to discover what are all the different pieces of this system and how do they connect and what can talk to what and especially there's companies, small companies even will have 30 micro services. Big companies will have 600 to a thousand micro services.

There is just absolutely no way that a human can remember all of them. It has to be documented somewhere and rather than putting it all in some sort of Wiki that has to be manually updated, you should just really have your system kind of stored in the database and stored using these crazy things that we call computers that are useful at remembering things so that you can see the system, you can also use that for all of your automation tooling as well.

Werner: Yes because the old saying goes, paper is patient and Wikis are even more patient. You can write any old crap in there then not update it and having an active system that actually verifies what is said in there or actually represents what is actually there is really important.

Yes, and it is something tangible to people and I always say that engineers are humans too so we are prone to all of the same human errors that normal humans are prone to and having kind of a tangible system that we can see and discover and interact with is -- it's very important to all humans and so yes, I think that's a huge part of building a usable automation tooling is giving people these tangible ways to interact with the system.

Werner: What's interesting you bring up micro services because it's a massive problem in that area basically or as you say a requirement basically.

Yes, and especially here at Craft Conf, everyone's been talking about micro services. People say “We have a thousand micro services” and I think “Oh, my god. That sounds horrible”, because you know that they probably don't have the support tooling that they need to make interacting with a thousand micro services a really delightful day to day experience for engineers and yes, it's really difficult if you're working on a team and you can't effectively get your work done.

I think most people just want to do a good job and a lot of times, a lot of the automation tooling, it gets in the way of that or there isn't enough automation tooling or you have to go to kind of a central DevOps team that's a gate keeper and it just creates friction points and it creates a sometimes bad experience for a lot of engineers where they don't have autonomy, they don't have control and also sometimes, they have to go to the central group of people who may or may not be pleasant to work with depending on the team and the company.

Werner: The wizards and the people in lab coats basically like in the olden days when you had to send your code on a punch card to someone to handle basically.

Yes, and that's funny because in some ways, things haven't changed. We still call it throwing code over the wall. And there's a lot of companies where there's only that one small team that is allowed to release code, the release engineers. And yes, you have to throw your code over the wall to them. I know of a company that had a release engineer where -- so if there was a deploy going out, you had to be in IRC. And if you weren't in IRC, he would come find you no matter where you were and he would scream at you in front of everyone just scream at you for not being in IRC when your code is going out.

You could imagine that this had to have happened, right? Think about just people and how forgetful we are just coordinating how many people's code was going out so on like probably a semi-regular basis, this guy was just screaming at people in the office so that's I'm sure a wonderful thing to experience.

Werner: He was probably popular.

I think he needed a vacation. I think he was really widely -- what's the word? He was revered in his own way because he managed this very complex system and he was seen as being very credible. And people, what's weird with that kind of behavior is when people see that kind of behavior, they validate it. They believe that there's some reason why that behavior exists that would justify the behavior and I don't actually personally believe that I don't believe that there's ever a reason why you should scream at your co-workers.

Don't get me wrong, I love computers and I think that so much of the work we do is important but it's not that important and so -- but what's interesting is that the dynamic is that people are rewarded for that behavior and they often justify it by saying well, they need to be there when their code is going out. What if there's a problem? But then well, the fact that you're screaming at someone in the office is a bigger problem to me.

   

5. Well, so maybe we can go into the aspect of what effect Usable Ops has on culture.

Yes. I think we've been starting to get into that where Usable Ops is about building really great technology solutions so that the workflow and process for engineers around releasing code is much more streamlined which helps make it possible for you to go faster but to go faster safely so you could deploy more quickly but you can roll back and we can monitor everything effectively and I think that creates a really great system for productivity which is great for companies' bottom lines but it also has this added benefit of if process is really nice, then there's a lot less friction on teams which means that the overall culture is better.

It's pleasant to work someplace where you feel as though you can get your work done effectively where you feel as though you have good relationships with your co-workers where no one is yelling at you, where there's camaraderie and productivity and people feel effective. That's a positive work culture and creating great automation tooling does that. Conversely, there's a lot of companies that don't have very good automation tooling and there's a whole host of problems that comes out of that.

One is that your system is often really fragile so it's more likely to go down, bugs are more likely to take down your system which is probably where a lot of the bad toxic behavior comes from is because there's teams of people that are trying to protect these very fragile systems and they believe they're justified in being incredibly rude to other members of the team who perhaps inadvertently break this fragile system and you create this system where there's a lot of toxic bad behavior which has a lot of effects on -- causes attrition, it makes people unhappy. I believe it has negative consequences for things like diversity. So in toxic environments where someone is coming and screaming at you is not great especially for people who are less protected in the industry.

Werner: So that's an interesting topic that this might actually help diversity to some degree.

Yes. I write a lot about how toxic engineering environments hurt diversity and it's this idea that toxic environments often hurt everyone on the team but they especially hurt people who are less protected so if you're the only female engineer on a team or you are the only person of color on the team, you're just more vulnerable because you are the only one of your kind.

So when the release engineer comes to scream at people, it's not the same thing if you feel like you're a protected, connected member of the team versus someone who feels very alone and vulnerable on the team and so these toxic behaviors are not -- they don't have equal effects on every member of the team and yes, you'll find that certain groups of people are really a little bit more sensitive to these toxic environments because they just don't have as much protection from them.

   

6. Can you give some other examples maybe of toxic managements issues?

Yes. So I have a series of blogposts on these things at https://kateheddleston.com/blog but one of the most popular topics was criticism, so criticism and ineffective feedback, women are given disproportionately more criticism than men in the workplace. However, all humans no matter what they look like on the outside hate criticism. We all respond really negatively to criticism. We get defensive. It actually hurts our overall productivity so if you receive a lot of criticism, it affects your performance and so if one group in the office is receiving more criticism than another group that can actually have unintended but really dire consequences for the group that's receiving more criticism and it's pretty well documented that women receive significantly more criticism than men in the workplace. Who knows why? But my advice is to remove all criticism.

So everyone hates criticism so let's just get rid of it. It equals the playing field but it also improves the environment for everyone and the ways that you can do that are by giving people feedback on the kind of behavior that you want them to see so rather than going in and saying you're doing all of these things wrong, instead walk in and say these are all the things that I need you to do and people actually respond very positively to that. They can extrapolate how their current behavior is perhaps not quite meeting the bar but at no point did you ever come in and criticize their actual work.

You simply said “This is where we need to go” so there's ways to actually kind of remove criticism. Another one I've talked about is argument cultures. So how arguments create really toxic environments. I have one called The Null Process and it's all about how there is no such thing as no process. It's like a null pointer. It's something that points definitively nowhere but you still have a pointer and it can be incorrectly used and dereferenced to point to garbage.

And then let's see, technical onboarding, training, and mentoring is the final one that's very popular. It's about how we need to train engineers when you hire someone, they need to be onboarded into your team and if there is no onboarding, then the people who are most likely to be successful are people who are like the existing group.

So if the existing group is white and male, a new white male is going to have an easier time being successful than if you know you have a group that's white male and you have a woman of color come on board because when there's no explicit onboarding, it creates a system where people have to use kind of the internal culture of the company to get onboarded and you're more likely to be successful if you are already culturally integrated with the group and so it's just yes, I read a lot about how there's -- I call it death by a thousand paper cuts and it's like here's the individual papercuts and how they might actually be hurting diversity on teams.

   

7. Very interesting. We find that on your blog?

Yes.

Werner: So to wrap up, you are starting a company.

Yes. So I was working at a previous startup idea and I built myself an internal tool to basically solve all of the problems that I have seen and I'm writing these blogposts and going in and consulting at companies and then people said, “Hey, we would use that.” So I am working on productizing it and rolling it out to beta customers and it's a tool that basically helps teams of engineers manage their infrastructure and deploys to make just a really nice process around how people release code.

Werner: So I guess we'll keep an eye on that. We'll keep reading your blog and you'll update us on your progress and your future writing.

Yes, definitely.

Werner: Thank you, Kate.

Jun 24, 2016

BT