BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations DevOps Is More Complex and Harder Than You Think. Personal Lessons

DevOps Is More Complex and Harder Than You Think. Personal Lessons

Bookmarks
47:08

Summary

Patrick Debois shares his personal lessons and stories on DevOps. DevOps is inherently complex and with many things to consider, there are many risks and things that can be missed or go wrong. Debois' intent is to help people not make the same mistakes.

Bio

Patrick Debois is the author of The DevOps Handbook and Pioneer of DevOps. He is now working in the field of media in the race to make video truly interactive.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Debois: It was completely wrong. I'm not going to talk about anything disaster in tech. The deck is new. The reasoning is not new. It's my story. I hope it makes you think about your situation, your company, your problems as well.

That's me. I couldn't bring myself to paint myself as a horse, because I still feel like a donkey. This is a story about five years ago, I joined a company that was having problems. I was already five years into DevOps. I said, how hard can it be? I've been patient zero, how sometimes people refer to me. Starting to learn so much stuff, saw, read, talked to so many people. How hard can it be? It turns out, very hard.

Dev and Ops

Dev and Ops, you come into the company. I came in because they had a performance issue about a problem that spawned an unzip of a file every time a new unzip process started up. It's very inefficient. We're talking about live production systems, live televisions, live shows, all that stuff. They were doing Git checkouts on production servers. There was no build pipeline. Nothing of testing. I learned that testing in that sense worked differently. We were working on prototypes. It was more important to understand that we were building the right thing, and that was evolving all the time instead of building it right. We could get away with manual testing. Over time, we improved. I think the best we got was that we did during a live television show, a deploy in production during the show in 2 minutes. Because the producer on Slack said to us, "Please fix this percentage. I cannot explain to people that the jury percentage of 70% for the winner and the audience percentage of 50% is not 120%." The engineer of course said, percentage, they cannot go above 100%. We fixed that live, during production. Had a lot of issues because our third-party systems during the load testing had a lot of failures, but we managed. The company got confidence in us. We were doing it during a live show. People were happy. The show was happy, and so on. This was only the start. This is my DevOps, why I went in performance issue.

Suppliers

We were working with suppliers, third-party systems that needed to work for us as well. It wasn't in the build pipeline. It was external to our company. Whether that's a service you use, that's a data source you use outside. It doesn't really matter. A SaaS. Later, we experimented with Lambda. The first time we used Lambda, we got an error, disk full. We didn't get it. We learned that instead of looking only inwards in our pipeline, that we had to work and see where our bottleneck was. The bottleneck was not inside our authority. It was outside. It was the supplier. It was the cloud solution. It was the video encoding system. We had to make friends with our suppliers. I learned over the years that on first use of a supplier, I would send them feedback from the first experience I had from the product. It even went that far that we were one of the first users for the mobile testing solution for Amazon, that I learned that they later used our ticket to go up to the engineer and say, "Somebody is using our product. We need to fix this stuff." Because when I listed all the mobile devices, it had blah-blah-blah in the list of the string. Working with suppliers is very similar to working with your internal people. They are craving for input. They want to know what you want. It's really hard to get honest, unbiased tech feedback. This was our first step.

I think at that time, I did a talk at Serverless Conference. I called it, "It isn't about being server-less. It's about service-full." We are relying on so many external services that we need to become friends with them. At the same time, I was thinking this is really strange. We always said Dev and Ops need to work better together. Get them in the same room. Get them talking to each other. Then all of a sudden, your supplier just has an API. It really felt like, is this the end of DevOps? We work with another company. We read the documentation. We use the API. There was no communication, no collaboration happening anymore. I found out over the course of time is that they are giving different cues on how they collaborate. One very good cue is that a lot of companies started writing postmortems of their failures and they make it public. For example, another thing they did was Fastly exposed the internal errors of their systems to us, so we could see that they're failing and it wasn't just our problems. The whole idea is that if you look at, DevOps started from system theory, looking at the bottleneck, where's your next bottleneck. For us, suppliers were the next bottleneck. We started making friends with them. That was being on the chat with them. Seeing them at conferences. There's a lot of different ways that you can become friends with them.

Marketing

Then, I got shares from the company. I became company co-owner, just by going in from a tech problem, because the next problem was we didn't have enough users. Thinking again, from the system perspective, I could have left and said, "It's not a technology problem. I'm going away." Instead, I started working with marketing. Marketing, usually they can sell people HTTPS as it's the next new thing. We often joked with the marketing person that she actually did that once at the job like, HTTPS. I found out that they can be your biggest supporter for new features. They promote that to the world. They tell everybody about it, so it gives you a sense of feedback that feeds back into your operations. They are an ally in your work, as well. I found that by writing some blog posts, and some stuff we did internally, they could use that as marketing material. It opened a lot of doors. I wrote a comparison article on all the video streaming solutions for real-time video streaming in the industry. All the vendors came to us because we're vendor neutral. They pointed to us. That was brilliant for marketing, because now we had inbound leads, people pointing. All those companies were pointing to us. This is just an example on how the tech bottleneck moved into a marketing problem, and how marketing was helping us and we were helping marketing as well. The stereotyping from Dev and Ops, you don't work with each other. It was still there. Marketing folks, they talk about things that are magical and don't exist. If you sit down with them, feed them the right information, they can help you really well to set a correct tone in your company.

Human Resources (HR)

The next one was HR. We grow. We need more people. Hiring people was a real problem. We were this tiny company three, four people within Belgium. All the bigger companies were getting all the people. How do we get them? We went to local meetups to find people that we think were interesting. We wrote honest articles on our website, how life was as we felt it. It wasn't about like, this is what we're hiring for. All the buzzwords or all the tech keywords. No. This is what life is, with good and bad. We were very open about it. We didn't get people that had the exact technical skills, as HR was typically searching for. We found people with potential. We had some bad people in a way that it didn't match at the end. We had good people where we saw potential. We improved our hiring mechanism in a way that it wasn't just having a few interviews and signing. We asked them over with the engineering team, spend some time. See whether that's compatible. Because the engineers actually complained, "Why are you doing this hiring somewhere else? We need to see whether they're compatible." It shows you if you have the narrow mindset of Dev and Ops just working there together to try and fix the problem of the build pipeline. The problem is way bigger and we can attribute way more to the organization than we think. We can get information from them as well.

Sales

Sales was hardest for us. When is this feature coming out? I want to sell this. They put so much pressure on us. Then I realized they are having the same problems we are having internally as the engineering. Here's the backlog. What needs to be done first? Who gets the resources? Then the dreaded question, how long will it take to build it? The salespeople get the question, when can I have this feature? How much will it cost? I cannot buy it from you, can you do it cheaper? They're all having the same problems. They even had the same problem being on call 24 hours, 7 by 7, if you're at a small company. They learned that they have to react as fast as possible. Speed was fundamental in having good sales. When it's on the mind of the customer who wants to buy something, at that time, you have to really fit in. Not an email after a week saying, "Let's schedule an appointment." We used the same techniques as we organize ourselves internally. We also explained to sales that when they sell something, could you please alter the contract in a way that we're going to have a prioritization discussion already in the contract, where it's about a number of days that are sold with the scope. Then we can negotiate the most important things. It was already in the contract, instead of being too late when it had to be delivered, to have that discussion.

Because of the pricing discussion, we learned how to build simpler architectures. I plead guilty as an engineer, to over-engineer stuff, and not to do the simplest things possible. They pressured us. They were right. Because if they didn't do that, and they didn't put the pressure on us, we would be losing customers because our price was too high. They asked me, how much does it cost to have one customer? Then we put a margin on there. Then we know what our price is. It's really terrible to ask an engineer what things cost, because he will build you the ideal model. Then put all the safety percentages in that calculation. Then they put a margin on top of it. You have to be open about the discussion. How the minimal thing is. What you can achieve. What it actually costs. Instead of trying to defend yourself ahead about having the discussions later that you have to have some margin. With margin on margin, you can't sell. I was really grateful for that. At times I hated them. I really appreciated that they were having the same struggle as we had. Once we move like, "Yes, we can build it." Those people who were interested, they really helped us, "Ok. Can we sell it?" That collaboration really led me to think broader on DevOps, where to go, where the bottleneck is.

Documentation Sells

We also found that documentation is really a selling point. We were coming in a B2B world where everything was, "You want to see our documentation. Just give us money." I would shy away from a product where I can't see the documentation upfront before buying the product. It's all about getting that feedback cycle as fast as possible. Once we put all the documentation outside, all the things we could put outside, we saw an increase in our sales process as well. The relationship on helping two silos typically thought of in an enterprise, how they can dance and improve stuff.

Support

While we weren't a big company, we found that our secret sauce while being small, was that people came for the features but they stayed for the support. Once we get them as fast as possible, like even on the first day, if we can get them on Slack, answer all their questions as fast as possible. Take all their worries away. I had a certificate expiring during a prime TV show. I had DNS expiring during a prime TV show. It's how you fix it. How open you are about communicating it. Not seeking the blame, "You didn't do this. You didn't do that." That mentality, that made our customers stay.

Legal

Then another ally, which I didn't expect, the legal department. Any people from legal here? When the GDPR stuff came in there, the engineer started to think a lot more about security. Also, we had a security incident, not exposure of data or something. It was really nice the way they wrote the contract, that it had our back in a good way. That collaboration, instead of having a contract, you have to be 100% secure. That's how these legal things go. Anything you don't do is your fault. Then by reasoning and helping them to write that in modern times, what we could do, what we couldn't do, and be upfront on that, even in the contract, was really helpful when somebody called up on an issue we had. That we could say, we're upfront about it. This is that. This really helped having our back.

Finance

Then finance. Small startup, money, customers. I didn't realize upfront how much I would worry about the money, being a business owner. Can we pay the people? How about the cash flow? Is it coming in? It also changed some of the decisions we made in the product. We didn't care in the beginning about reporting, like who used what on the show. Servers are up or down. Finance said, we need this billing. It needs to go as fast as possible because we're going to have an issue with cash flow if we don't get the payments rolling quite soon. Again, something you wouldn't expect, thinking of DevOps in a small sense. If you think broadly, where's your bottleneck in your company? It might not be because you tuned the kernel to 2% more. If you're not having money, that's a bigger issue, because the whole company might not exist anymore. Always think about, where is your next bottleneck? That's the mantra I've always used going into these companies, and think about the problem.

The End

Then I decide, it's enough. I was working so many hours. At some point, I thought, we are compensating in engineering the fact that we aren't selling enough. Doing the long hours. Technology on a shoestring budget. I learned about something that's called hopium. Who has heard about hopium? Hopium is the drug when your salesperson says, I'm going to sell five customers the next quarter. Then it's going to be the next quarter, and the next. Then it's no money. I decided to end the company. We tried everything we could on a collaboration perspective, of having all these groups work together. I think I went broad enough, as some would say, Biz Dev, sec, whatever ops. Still, our biggest bottleneck is that there wasn't a market fit.

For Sale

We started actually talking to companies that could buy us. I found out everybody in the field was struggling with the same thing. Everybody wanted to do it, but it didn't bring enough money. We were thinking, we're doing it wrong because the others are doing and they have customers. They were 10 times the size we had. Secretly, they were doing other stuff, hoping this would take off. They already moved here. We were focusing so much on here. It's something you don't usually talk about.

MOB

Everybody is fired from their company, and you want to sell your software. That's an interesting period to be in your company, because as the biz owner, you feel responsible. When the news was said into the company, everybody had this down feeling. Everybody is looking for a new job. At that time I was looking into the concept of mob programming. We spent three days of writing documentation for the next company, so not even code. We had so much fun. It was insane. Everybody could just say, I've worked here for four years. I never knew that. The fact that we just sat together for a couple of days and wrote that documentation. For me, that was really eye opening. We thought we had a good team. We had everything working, and even a phone going off. When everybody has left and you need to transfer it, that's when you see everything that has become legacy. It's really painful because then you see all the problems you lived through. All the things you said, "Not now," and you're trying to sell it.

Exhausted

I think I was exhausted. It took me about three months to get in my regular period of sleeping habits again. My upside is that I really have a lot of empathy for people. It's also my downfall, because I wanted to shield the team from doing too much, long hours, so I did them myself. It wasn't actually the smartest thing to do afterwards. What do you do? You can't hire people. There's no money. You want to keep the same people because if they leave, you have a bigger problem. Yes, exhausted. My colleagues weren't too happy about it, but I said this is the end. We ended up. This is where I joined Snyk. I didn't even know DevSecOps was a big thing. I've been so focused for four years on my company. That was just one of the things happening to the industry, but it wasn't that important. I learned that it was a big thing happening. They have been so helpful to have me join them.

Distrust and Control

Starting at the security landscape, one thing that I completely saw is that when you've seen Dev and Ops becoming DevOps, and now you see Dev and Ops, with security, there's this same level of distrust that was before between Dev and Ops. It's interesting to see that this is a repeating thing, going in companies. The notion of control, like ops used to be all about control. You cannot change something. You cannot do something. Now security was in that same thing. You cannot do this. You cannot do that. This control feeling is going away.

This made me think about, when you have a CI pipeline because part of the product we're selling is scanning image vulnerabilities. Is this actually increasing our trust in the security? I found out this is more about the confidence. If there's no notification with one or two messages, then it goes away. It's this confidence, if we can get the number away and nothing happened, we have this notion that it's becoming secure. We build the confidence by how often you hit something. We're not really addressing the issue of the trust and knowing for certain that something is secure.

Degrees of Freedom

If you look at it a different way, is that if your CI pipeline is fixed, and the more you add to your CI pipeline, the more the process is fixed. I saw a presentation from a guy at SAP, and he said, "Before we had 60 things to do before a release." It was a whole checklist. Now it isn't there anymore. They put it in their CI pipeline. Then I've also seen at places where the CI pipeline became, you can't do this because it doesn't parse on our CI pipeline. Sometimes you wonder whether the established process is the one that has the most freedom of doing it right. This dogma of the pipeline is something that I found fascinating. It often comes about the perspective you have around looking at things, where Devs don't do this. Sec don't do that. Ops doesn't do that. I'm now in a Dev role, and I don't think I know anything about technology, because I talk to people. It's always this mantra of looking at things. I learned that as an engineer, I was risk averse. I'm also risk averse in conversations. I would ease things out and being in product. It's something I learned as well that is not always a good thing.

Best Practices

I also found that these best practices, it's really hard. Who does Git commit reviews? For a conference, they had this GitHub repo. I changed my name because it was spelled wrong. Immediately I get, "You didn't follow the process." I was like, "I will take ownership if it fails. Don't you trust me to judge whether that was something, changing two letters?" It gave this false feeling where we're doing these practices because we don't trust the other, or we don't trust ourselves. I get that. Maybe it's masking the trust issue. The real pain is not having the conversations you need to have.

When we ended our company, I saw so many discussions come up again, of discussions we only have discussed while being at the company. All these frustrations. All these things the other didn't do, but we hoped he was doing. All these painful things come up. There is a cost of not trusting people. I'm not saying CI/CD is a bad thing. You really have to see through it and dig into the real pain. Are you doing the check-in and the build pipeline because you don't trust the other person? Or, you're just doing it to have a good process. That is crucial to think about why you're doing that stuff.

Currency of Trust

Coming back to the trust issue, and I've looked at it in many ways that I could. Do you trust your salesperson to do the right thing? Do you trust your marketing person to do the right thing? The trust from the description I found, we tend to think a lot about competence in trust. The Dev people don't know anything about security, says the security person. The Dev person says, security doesn't know anything about coding. It's not their job. It's not the competence they know best. We judge them by not having the same competence we have and the levels we expect from that person. You can be competent at your job but completely be unreliable. Not show up at the meeting that you're supposed to show up at. Not being there on time. Doing something but in the weekend that nobody is around. Reliability is part of being at the trust center. Then the sincerity is, do you do as your word says? I personally struggled with a person that we needed to fire. It's obviously not a happy thing to do. I struggled with that as a human to fire somebody else because that's his income. That's his job. Then somebody said to me, if you're not firing that person. That is totally unfair to the others in the company. You say you care about them as well, but you're not sincere in your actions when you're protecting that one person. Yes, you can think about it, but where's your sincerity in your actions?

Care

Then, do you care? I wonder sometimes, imagine the CI pipeline has a fire alarm and has to leave the building, and you would do the same process manually. Would people do the same process? Will they still care? I found out when our Devs had the power to deploy directly to Amazon Lambda. They skipped the build system, just go deploy. That's fine. Even though they were doing that, and every of those motions, do they care about the thing that we're trying to do? All of these things build upon the trust. It shows at different places. We often reflect upon how the others can trust us. We have a job to do of making ourselves trustworthy. That could be on the same things. He can be competent. It's an easy way. "I know this. I got this." Are you sincere? Are you reliable? Do you care? Because you can do all three and not care about the right thing to do.

Open Source Software (OSS)

In my context of company, I had to evaluate open source libraries. What do you do to trust open source libraries? Of course, a tooling company would say, we run a check against the database and whether there's a vulnerability, yes or no. It's a lot more complicated. We look at the competence. Some people look at the code on GitHub. Some look at the competence by the fact that he already did some other projects. The sincerity could be checked about, are they open for pull requests? Yes, they have a code of conduct, but are they following it? Yes or no? Do they merge my pull request? Do they care? All these signals exist also on how you look at an open source model.

Services

The same in your pipeline. The same in your services. Do you trust somebody else by looking at the same stuff? The confidence, yes, Amazon is up. They're competent. They're reliable. Do they care? Maybe if you need to have that feature that they don't want to do because it's very niche, they don't care. You have to go to somewhere else. You always have to find a balance between all the different things.

DevSecOps

DevSecOps working together. This feeling of control that they say, we don't want to control the Devs. We just want to educate them, competence. We want to be sincere. Yes, we'll be explaining things and show them, and so on. Do they care? I asked a security person, how do you convince a Dev that he wants to build secure at the end? They all talk about the shared responsibility, but they never look at the reason why that person would care. You can make them care with a stick, but that's forced care. Why would they care? One way is appealing that they can become a better programmer, because by looking at security, sometimes the architecture of the application improves. There's different ways of looking at these incentives.

Team

Then just like a startup builds the trust in the market with little things like the documentation, the support, that's how we gain the trust to do stuff as well. Then as a manager, I explain, leading as you think, and walk, and be trustworthy. That's also quite important. Then even with the team, do you trust the others of the team, or just happily sitting together pushing code in the pipeline? Make that reflection now. Do I trust the people that I work on a day-to-day basis with? If not, look at the four things where you can improve. Do you want to become more competent? You want to see what you're doing? Do you care? Maybe if you don't care, you're in the wrong job. It's possible.

Self-Reflection

Then, on a personal note, think very much about yourself, because when they did this poll about how trustworthy do you think the other group is, and so on? People always think of themselves more trustworthy than the others think of you. They will tell you things like, "In that meeting, that wasn't right, what you said." Self-reflect on that. Have those real conversations about trusting people, working together, because the cost of distrust is just so high. If you're building all this stuff to control and make sure that these things don't happen, make sure we're having the right conversations there.

Resources

Some homework, some books to read. "The Thin Book of Trust" is really a thin book, but it puts it quite eloquently. There's a new one coming up, "Agile Conversations." The whole book is about, I meant to say A. I have this conversation, but the other proceeded completely different. They train you to reevaluate what you do, your actions. Nothing about tech, but what do you do? How do you build the trust? If you go into a meeting saying, [inaudible 00:39:55] will say, "I don't care. I want this architecture." That's not listening. It happens so often. People are set into a meeting with their mind, how things have to go. Then when you have the meaningful conversation, it turns out way better.

"Software for Your Head." It's a little bit nerdy. The book is trying to optimize for maximum bandwidth of transfer of not data, but intent. They have interesting ways of building that. You can say no in a meeting. They have the ceremonies. If everybody has to follow the core protocols to communicate, there's a lot of fluffy stuff and sentiment that goes away, and you get more to the essence. Obviously, it's not only on the individual, the team, but also the manager. That's the fourth book, leading people. Building that trust. Building that group to come together. Then last bonus is from the "Agile Conversations." It is written by these two people that have an agile podcast, which talks more about these things.

Tools

Maybe, 5% culture. All the rest keep explaining that syndrome, "I'm sorry. I don't think we saw that in the last 10 years of DevOps."

Wrap-up

I want to make you first think broader, the company. It's not just DevOps. Then introspect again on, have you actually achieved DevOps as a collaboration aspect, is very important.

Questions and Answers

Participant: I was interested on when you said about, you found the people with potential, not maybe the skill. How do you do that? It's interesting.

Debois: The question is about, how do you find people with potential and not just about the tech skills? What worked for us was just saying that we weren't looking for specific tool sets. This broadened up the spectrum of people applying for the jobs, then we can just have that conversation. Often, they would have an interest in retooling, rethinking their spectrum. Even though they might have been senior, but in a different stack, we didn't care about that. We discussed, how's their experience? What they did. How they did it in this stack. Then part of our offering was that they'll be able to learn the new stack from the other engineers who already knew it. We were just open about it, from that perspective.

Moderator: It sounds like you're saying to us that the fundamental point behind DevOps was people crossing a barrier, crossing a silo barrier in an organization. It wasn't really about the CI/CD pipeline. It was about that barrier. Now we've added sec into the equation a lot. You're putting out, why didn't we add sales and marketing and things? It's hard to think of DevOps describing us crossing all the barriers in an organization. What should we be calling this? What should the movement now be if it's gone to every single silo in the organization now?

Debois: I would just call it agile. Historically, I went to speak at a lot of agile conferences, and they just weren't interested in the part beyond the Dev. The business part they would have been interested in, but they weren't that interested at that time in the operational side or the production side. Actually, I hoped at the time that one day we would have shared conferences, but it's happening here for example. It's also requiring separate conversations. It has created a bubble in some way that people can be in the DevOps group. Frankly, I don't care too much about what we use. I have no stock options in DevOps. It doesn't really matter to me. I think I one day on the 1st of April, I joke, "Let's rephrase DevOps to something else." All the marketing was screaming like, "Let's rephrase all the marketing material." The name isn't that important. I think, from my perspective, the industry has said DevOps is the pipeline. It wasn't the intention. Then you could say, agile didn't have the intention that agile was Scrum, and ITIL was the help desk system. That's just how history goes from it. I do believe whatever is the next thing or comes around, we'll build on this so that we still change the reasoning and help that build up, whatever is in the name. I think the importance of the name in the beginning was about finding similar forces under a label where you can find them. After that, it didn't matter that much anymore.

 

See more presentations with transcripts

 

Recorded at:

Oct 02, 2020

BT