BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Security and Compliance Theater - The Seventh Deadly Disease

Security and Compliance Theater - The Seventh Deadly Disease

Bookmarks
45:28

Summary

John Willis describes the “Seven Deadly Diseases of DevOps” with a focus on the most costly of them all - Security and Compliance Theater. He talks about the practices needed to create long-term systemic “safe” improvement.

Bio

John Willis is the Founder of Botchagalupe Technologies. Before this, he was the VP of Devops and Digital Practices at SJ Technologies the Director of Ecosystem Development for Docker, which he joined after the company he co-founded (SocketPlane) was acquired by Docker in 2015. He is the author of 7 IBM Redbooks and is co-author of the “Devops Handbook” along with Gene Kim and Jez Humble.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

[Note: please be advised that this transcript contains strong language]

Transcript

Willis: I'm John Willis, I work a Red Hat. I'm jwillis@redhat too. I've done a bunch of stuff. If you're really interested, I've been involved with every one of those companies, and that's only the last third of career. I'm old too so I've been doing this for a long time. Ubuntu; I was early at Chef. I sold a company to Dell, sold a company to Docker. One of the founding members of DevOps, itself. DevOpsDays, DevSecOps. I wrote "The DevOps Handbook," I wrote 10 books, 10 startups over the years.

As of three weeks ago, I joined a new program office at Red Hat with Andrew Clay Shafer, co-author of "Phoenix Project," and Jabe Bloom. I'm just going to point this out. This is a book that I wrote earlier this year. I worked on it with Nike, Saber, Mike Nygard at Saber, CapitalOne, Marriott and PNC Bank. We're going to come back to this later, because this is the second half of the presentation.

This is called the Seven Deadly Disease. I'm going to focus in on what I call the deadliest disease of them all, security and compliance theater. About two and a half years ago, I left Docker, and I thought, I'm going to really fix the industry. I've got all these tools, lean tools. What I found is every time I tried to use a prescriptive tool, it got in the way of really getting the truth. Wven the word devops got in the way, and agile. People are "I don't like devops." I'm, "Good, because we're not going to talk about devops today."

These were abstractions, and what I really wanted is to figure out why you suck. What is that bad organizational culture that just systemically lives in your place? These things just got in my way. I made a good living off of some of these, but the point is, I'm trying to do what I call organizational anthropology.

Organizational Anthropology

I do this thing where I go in for a month in a company and I talk to 200 people, I bring in notepads that make fun of me. I don't touch anything, and I go back to the CIO, and I tell him, "Hey, you suck." I'll never be able to do this at RedHat now, but there will be a day where, ok, put my feet on their wooden desk with a cigar and say, "Hey, John, how did we do?" "You really suck, bud."

What I found as I was doing this, as I de-prescriptivized myself from all the tools I thought were important, I found there were these patterns. I could actually learn a lot about your organization if I could just, in my head, put these patterns down for myself. Now, seven's a cool number. It could have been six, it could have been eight. It actually works really good when you're submitting a presentation to call it the Seven Deadly Diseases, but they were just patterns.

I've done longer presentations, and I'm not going to go into all the patterns here. I've got other ones, look me up, about not understanding how work flows, multiple systems, like10 ticketing with all different context. Misaligned incentives is a classic devops story. Tribal versus institutional. If you read the "Phoenix Project," the Brents. These people that seem to have all the knowledge and they're so busy they can't share it. Incongruent organizational design, I have a great story for you there, on Conway's law not applied to microservices. How about that? Then, complexity.

Here's the thing. I wish I could say, three years ago when I left Docker, I was a genius and I figured out how all this worked. When I went into non-prescriptive mode of just having conversations, I recognized these patterns. The one thing that just always happened, which was it boiled down to "Your audits suck." The point is, what I found was this things funneled down to the end of the day when I'm sitting with the CIO, and I go through all this. I'm "At the end of the day, what you think you're doing from a risk profile, governance risk and compliance, is fantasyland. It is compliance theater." I'll open up the notebook and I'll show you why it's just nonsense.

I call it a negative risk ROI. I'm going to try to turn this into a full audiobook, because each one of those seven could be an hour by itself. We're going to focus on, how do I get the security and compliance theater through this lens that I use to go in organizational forensics or anthropology.

How many people have heard of the Abraham Wald story? It's a fascinating story. During World War II, they did the Manhattan Project. Down the street, there was a project of statisticians and mathematicians to figure out a lot of things. How do you stat and math to win the war? One of the things particularly Abraham Wald was working on was, how to repair bullet-hole fighter planes. They do a lot of stats. What's the right metal ratio? Where do you repair? At one point, Abraham Wald has this epiphany, "Everybody stop. We got it wrong. We're looking where the bullet holes are. They're the planes that are coming back. We should be looking at where the bullet holes aren't." This is a metaphor for – I got this from Sydney Decker – they call it survival bias. To me, it's the cognitive bias of I don't want you to tell me what's great and what's bad. I want to figure out where the bullet holes are by having conversations where the bullet holes, in other words, aren't.

Complexity

For those of you who read "The Phoenix Project," it was actually based on a book written in the 1980s called "The Goal," by Eliyahu Goldratt. Twenty years later, Eliyahu Goldratt did "Beyond the Goal." It's an audio-only book. In fact, it's the reason why I came up with "Beyond the Phoenix Project." It's brilliant. He talks about complex systems and theory constraints in his 20 years of learning. One of the really interesting things in it was he asked this question. He says, "How people think about complexity?"

He said that physicists think about complexity differently from just the rest of us. He asked the question, "What system is more complex?" I'm not going to do this to you now, but the physicist would say it would be system A because it's more degrees of freedom. I would say that for me, that system B is all that noise when I go into your company. the consultants come in, and I don't do this, "How good is your CMBB?" "It's fine, John. Move on." "How are you good on security vulnerability?" "Fine, John. Move on." I don't do that, because I learned my lesson, but that's all the arrows. That's a system B. Remember I said earlier, you can't agile safe devops. Those are the arrows. Somebody's giving you their version of the complexity. I don't want that. I know your house is burning down. I know you're not just sitting there drinking coffee. I know your CMBB is terrible. I know that cloud native and your idle staff are on different floors and they don't speak the same language and they take different elevators.

What I need to do is get the system A stuff going. That's what I'm doing. I'm trying to go in, and I don't want you to give me your abstractive answers. I've got to pull this, I've got to gamify it. This is what Eliyahu Goldratt talks about in "Beyond the Goal." It's, how do you really understand complexity? He would say there is no such thing as complexity. That's a longer debate.

Free water for anybody who can tell me what this is related to. It's hard. It's a terrible question to ask you. I'll make it a little easier. It's the Struts 2 vulnerability that happened in 2017, the one that took Equifax down. Basically, it was a component called Jakarta. If you actually stubbed in a malware code in the script, and it was an authorized machine, you're off to the races, kill chain, you're dead. Basically, that's how you could attack. We've done workshops on this. In general, this is basically how the Equifax breach started. Now, back to the fine dog with the tea and fire. The CEO said, "The Equifax breach was on a single person who failed to deploy the patch." Done. End of story. No, not going to accept that.

The good news is sometimes our government gets it right. It was last year, 2018, they let you to see incidents in their glorious capacity, the government did a great write up. The kill chain. I mean, more than you'd ever want to know. They got it right on Equifax; a lot of details, really cool stuff. The thing is that what's interesting about this thing is actually it's kind of a tale of two stories. It goes through all these things that happened, and even in the report it says, "At the end of the day, the person who should have patched the service didn't patch the Tomcat service."

In there, there's all this juicy stuff. Before we get to Conway's Law, one of the things that's in there is that the intrusion detection systems, the things you pay millions and millions of dollars for on the Edge to detect anomalies and stuff like that, had 18 months expired certs, every one of them, at the day of the breach. Now, that wouldn't have stopped it, but it took four months before they actually figured out what was going on. Maybe the $10 million you spend on boxes that were running but couldn't work, because somehow the automation to update the certs wasn't there.

There's a notion to Conway's Law, we hear it all the time, but it's not just monoliths and microservices. It really does stem from organizational design. What was really interesting in that Congressional report is, what does Conway Law have to do with that org shock? The CSO reports to the chief legal officer. In fact, there's this thing called pluralistic ignorance. That's another cognitive bias. During the testimony, the CSO is asked, basically, "Didn't you think there was something wrong with this?" Her response was, "They knew what they were doing. Equifax, big time job. I just figured they knew what they were doing." Here's where it gets even uglier. When she was asked, here's what it says, "Is there any particular reason why you did not report to CIO your belief that the PII may have been exfiltrated in connection with the security incident?" "I don't remember a particular reason about that. I just didn't think about it."

Of course you didn't, Conway's Law. It's glorious punching you in throat; she reported to the chief legal officer. I guarantee she reported to the CIO. In fact, the CIO went on vacation the day they found out, a 30-day cruise somewhere in Mediterranean. It's in the report. Imagine if she worked for the CIO and said, "PII, don't go on that cruise."

Capital One. "The attack itself was carried out by a former Arizona employee, who after leaving the company in 2016 maintained knowledge..." Bullshit. It was the metadata server, and a default WAF. In fact, I don't know anything, but if they were running Cloud Custodian or Guard Duty, it would have caught it. I love how the press made it like there was this FBI agent and this woman who was a genius, and it was like a Tom Clancy novel. Think about it. What was in 2016 that was so spectacular about hitting the metadata server because somebody put a default WAF in and forgot to configure it? There's nothing Clancy enough about that. I mean ,that's it, right there. My kids could do it. Curl, something that Capital One, the WAF has by default, bypass left on. That dollar URL is, "I don't care what you're doing there. I'm going to metadata server."

Here's the thing. I did external forensic. I wasn't in Capital One, I don't know exactly what happened in Capital One. I have other bank customers that gave me a lot of intel, and I think I'm pretty close to the mark here on what happened. Capital One is amazing for in-process process, Cloud Custodian. They paved the way for banks doing devops and all this stuff. This happened to be a one-off. For whatever reason, they had to do an exception.

This is going to sound really weird, hurt your brain. You need to have a process for out-of-process processing. They didn't have a process for exception out-of-process processing. I know that sounds insane, but the truth of the matter is, for everything they would have done on process, they're really good at this stuff. For whatever reason, they had to go out of process to do this, and I could imagine the guy sitting around "Has anybody used the WAF lately?" "I used one five years ago. It wasn't mod security. It was mod security." "What about an instance?" "We've used this a bunch of times." Authorize group. The next thing you've got bypass turn on a default WAF on an authorized system. What do you do? You hit the metadata service say, "Give me everything you got, bud."

Once you get into that, the metadata servers, thinks who you are. It's called service side request forgery. This is our biggest problem in our industry right now. The bad guys just figure out how to fake who they are. Encryption RS, I don't want to get into it, but if the 100 people that are coming to you actually can decrypt, the bad people will basically figure out how to SSRF that person. This is a real person. I'm not even saying I know how to solve this, but if you're on an authorized system and you hit the metadata server, you're getting some really interesting stuff.

What I do is I spend a lot of time just talking to a lot of people. Nowadays, I really do focus on security, but going in, I'm "Yes, this all looks like a security problem." That's my book. It's funny. The first day I go in, people are making fun of me. Nobody wants to talk to me. "Notepads? What are you doing? Haven't you heard of computers?" When I'm done, that book, at one of the largest banks in the world, a CIO offered to pay me 10 grand for it. You don't get the book, because the book is the truth of all what people tell me. I promise the people when they tell me the truth, I'm never going to reveal what they said. I'll say what they said, but I'll never point it. You can't buy that book. It's funny.

On the last day, everybody wants to talk to me. It's so funny. I got queued up. I get COOs "Does he know anything about agile budgeting? Can I talk to him at his 9:00 to 10:00 flight before he flies out the next morning?"

Two Days With Leadership

This is something I learned from Kevin Behr. What I have to do first off is, get some contextualization on what I think I'm going to hear. I talk to the edge. I want to just talk. I talk to the CIO and the leadership just to hear what they think, but I know it's wrong. Or, it's not wrong, they just don't know. This trick that Kevin Behr, a "Phoenix Project" co-author said, I get on the phone, I do these all remote too. They'll say, "John, here's what I know." "No, I don't hear what you do. I don't care what you do. I want to ask you this question. Give me the five things you think this company should be doing that you're not."

They rattle things off and they're like, "I can't believe I told you that." Or you get "Wow, that was therapeutic." Now, I got the framing for how to go in and try to figure out what's going on, and then I just spend a lot of time. I don't waste my time. I want to make sure I really cover the security stuff. The top three things I always find, and these sound like platitudes but they're not, is toil, risk and inconsistency. Again, I got a 400-page book with notes and color codes going down to the page number and the quote. I would never say this, "Bill said on Tuesday on 3:00 that, by the way, you don't actually add policy because you don't want to add anymore audit constraints."

Quickly, the toil, when it comes to security, I've got a longer version of this, the dependency. The dependency map is insane. I don't care how much you spend on scanning, the single largest ratio is broken. There's five nested dependencies. There's actors that are actually sub-planning nefarious code. I don't know if you heard about the crypto miner node thing. Somebody took a year to get commit authority, because they knew there was a bitcoin operator that used that node modular. That would scare the hell out of you. Those are the actors we're playing with right now.

ITL and toil, this is back to that you really think you're doing all this stuff, you're really just adding 25% of your budget in toil and not getting any safer. You're filling out forms, NFRs, all these things. If you watched Damon's [Edwards] presentation, it's about John Ausper's incidence of unplanned investments.

Moving on – risk. Martin Casado is pretty much attributed to the person who defined software-defined networking. He started a company called Nicira, sold it VMware NSX. Now he's at Andreessen Horowitz. He said this about four or five years ago. He said that in our industry, we spend 80% of our spend on perimeter based, and less than 20% on inner perimeter. The truth of the matter is, I've done a fair amount of study of adversaries. I wouldn't say I was an expert. They never heard of Barracuda or Palo Alto, they don't come in through the edge. They don't have to, because these companies are so bad at their hygiene, you can sneak in the building. You can tailgate in. You can find an empty cube. You can start end mapping.

The most dangerous points of your company are not the things at the edge. They're that somebody left an instance up over the weekend, forget to kill it with an old Jenkins server, and bang, your kill chained. Low attestation. I'll talk about attestation efficacy. Then, configuration management is our new world of danger, which is, a lot of the volume has been about library scan, vulnerability scanning, all this stuff. Now, a misconfigured container or a misconfigured Kubernetes, these are the new risk profiles. By the way, don't think that the adversaries don't know about these misconfiguration opportunities. I'm not just talking about resetting the default password. I'm talking we have very complex configuration items that maybe you have everything battened down on a Docker container, but then you forgot to turn the flag on Kubernetes. By the way, those security things don't work now. There are some really complex configuration in it. Don't even get me started about helm charts. Again, there's a lot of stuff out there.

Then consistency. That's also like inconsistent environments. I'll go in large banks and I'll ask the security "Right now, can anybody tell me all of the highs and critical vulnerabilities that you have running in customer facing systems right now?" If they can't, "I don't know what you mean," there's a whole sit down discussion we'd have. If they say, "Yes, I do," I'm, "We don't have to do it. I trust that you do that."

"Do you have escalation policies?" "Yes." "How long do you wait till..." That's where they fall flat. These certain things, they take architectural design changes. That's not even the thing, because that's everywhere. The thing I love then, the next question I ask is, "Do you have a different policy for test versus production?" "Of course we do." I'm "You forgot that part about the adversaries not coming in through the perimeter. They sneak in." Your most vulnerable systems are those test systems. You should equally have your policy across dev and test. I would argue you should have stronger policies, because that's the one place. You can do everything perfect.

The analogy somebody gave is, it's like water going down a hill. You can stop it, but it's going to keep finding its way down the hill. That's what we're facing with these adversaries. If you want to know somebody who's amazing at this is Shannon Leeds.

The Deadliest Disease

Let's get into the deadliest diseases. Again, I've got longer presentations on that at Botchagalupe, My Presentations. Github project has pretty much every presentation I've done in 10 years. Let's talk about the deadliest disease. I'm giving you the shorter version of the seventh deadly disease because I want to focus on security, but it is about how do you work through the system? How much work do you document? How come certain work doesn't get documented? Who made that decision? There's a lot of things that go in. How do you think about complexity in failure zones? Do you embrace blamelessness, psychological safety? If you saw Damon's [Edwards] presentation, it's just about everything he covered.

One of the things at a high level, we start thinking about sort DevSecOps or shift left auditors or shift left security. I always like saying this joke, everybody laughs. Somebody said I should make a T-shirt out of it. How much you suck is a multiple of how many review boards you have. I've seen XML review boards, architecture review boards. It's Wednesday and it's raining review board. I mean, with that one, I'm joking, but I've seen 5, 6, 7 review boards to get code in a system in a bank. They all happen instantaneously at the same time in microseconds. No. They have all these overlapping properties. This one's at Wednesday at 3:00, and if get that on time, you might get the Friday one. If you miss that one, you've got to wait until the following Wednesday.

Just thinking about the review board, it might take a week to code, it might take another five weeks to fill out the forms that you've got to fill out. Then, it's another eight weeks to get through all the review boards. I'm not joking.

Checkbox compliance. I'll get into this subjective nature of how we do audits today. We create change tickets. Bob says, "I'm going to do this." Bob has somehow figured out the whole complexity of a system, and he can document in two paragraphs how it's going to affect these incredibly complex systems. I'm joking, he can't, but he does anyway. He passes that on to Sue and Sue's "You now what, Bob? Can you add one more thing about the..." Then, this change of events happen, and it's just like a telephone game of humans trying to describe impossible complex systems.

Humans can't describe the complexity of the systems we manage, we just can't. Not in large enterprises, not companies that have basically 2.7 trillion in assets. No human could. What happens though? The auditor comes in and says, "I want to change record for this." "Sue told Bob to tell Tom to tell..." "Can I see the screen prints?" I'm not kidding. They don't even trust the system. It's what I call subjective attestations. That's how we pretty much operate.

Vulnerability theater. How much do you spend on vulnerability scanning? Five million? I've got seven million, and I got this one and this one. I don't care. You can AI and ML, you can tension load, you can basically juggle on dynamite with flamethrowers. It is basically impossible to detect the complexity variation. You look at some of the software supply chain reports about how much open source is being used out there, and it depends. You write 10 lines of code, it's basically a million lines.

Some of those dependencies are five levels deeps. I mean your scanner is not going to catch it. In fact, it's even worse. I hear people tell me "I hate those scanners." "Why? Why do you hate the scanners?" "It keeps telling me I've got SQL injection, and I don't do any database calls." There's some good methods out, and you've got to do it, but if you think that's your backdrop for safety, big trouble. In fact, all the things that you do that are not making you safe, is what I call a negative risk ROI. A, you're not getting any value out of it, and you're stealing time for what could be positive risk ROI.

This is something I did a while back and we regulated it. This is the kitchen sink of a DevSecOps. We're going to throw everything at it. I think a lot of what we've done and learned over the years is we have this kind of notion of a supply chain, and we have these gates, - green, green, red, go back. Green, green, green, red, go back. We constantly iterate to get our feedback and improvement. Really, a lot of what DevSecOps is about is just adding those security things into those red gates, so make it just one stream. That's what this is all about.

I don't want to overwrote, take too much on it. I was involved in the early discussions at DevSecOps, just like I was at DevOps. I think it's taking off on a course I'm not crazy about. I've been really focusing back on automated governance, I'll get back to it in a minute. Here's just a couple of things I want to talk about, DevOps, SecOps operational tips. This came up in one of the open spaces yesterday. If you're going to start up a security project in devops and pipeline, get the policy people in the conversation at the story building. Literally invite them in, let them help you design. You don't want to finish and then go to them and say, "What do you think?" "Terrible." Actually getting the policy people in early or your internal auditors. I make fun of review boards, but banks have ORMs and stuff like that that are really important and can go out of business.

We'll talk about subjective. Ruthlessly eliminate false positives. You have to be really good on that. A lot of systems will allow people to whitelist the user. Nexus have all these things. Then I go in and people are "Yes, I keep sending it to them and never it gets whitelisted." Again, these are social technical problems. The technology's there to whitelist the stuff that you don't want to see all the time, because you know it doesn't affect you, but if your system is so broke, first off, you don't have authority to change the system, the vulnerability scanning thing. Then, the people you send it to never seem to put it on the list. Make sure that you don't have different systems for your security stuff. James Wick says, "A bug is a bug is a bug is a bug." If you use Jera, use it for everything. You shouldn't have a different system for vulnerabilities.

Automated Governance

The part I wanted to talk about was this automated governance. The idea was, how do we change subjective attestation, the Bob, the Sue, the whatever, to objective? Attestation is evidence that you've done the things that the policy people say you should be doing. Objective attestation.

Earlier this year, I put together a project. It was a three-day working group where it was Nike, it was Capital One, it was PNC Bank, it was Marriott and it was Mike Nygard at Saber. The idea was really threefold. One, a lot of our DevOps arrogance; we say, "Get rid of that CAB." The change advisory board is, "Yes, how? I still have to do audits. You don't give me the how." The first thing about this was the possibility of defining a reference architecture so I can say, "Get rid of the CAB and here's how," one. Two, the efficacy of most audits is terrible. I mean, they're just terrible. You look at what people are doing, especially when you start talking about cloud native and deployment, and you look at what the governance and rules policy are for the audits, they're so disconnected.

I've had horrific stories. One of the ones I love is, there's a team in a bank that told me, "John, if one thing you can do for me is, every time we have to have a new project on Amazon, I have to do a whole write up and present the write up to somebody about the business continuity of S3." Now somebody should laugh right now, because I think it's like 31 nines. I didn't take math. I didn't do well. That's like every molecule in the universe except one, and then you lose your data. They have to have that discussion every time they have to start a new release. Efficacy.

The third is, just this is glorious that most companies take 30 days a year to do audits. Screen prints aren't all that glorious. What if we can turn the efficacy from 30% to 90%? What if we can get rid of the CAB? What if that we can turn 30-day audits into half days? That was the goal of this project. It actually all started with Topol Pal at Capital One, wrote a blog article way back when, and it was basically how they built pipelines. There was this thing about control points, and this is very related to SRE. It's not really SRE. SRE has come up a lot today. In order to get auto deploy to auto commit privileges, you had to show up that you, as a developer, you could evidence these things. Today, it was like 16 or 17, now it's like 30. This is all public knowledge. The blog article is all public.

The point is, if you think about the SRE contract, you got a service. I'm SRE, you give it to me and ask SRE to manage it. I look at it and say, "No, man. You've got to put some more NFRs and operational things in there so I can manage it. You do that, we make a contract." This was a starting point of how they were doing it pre SRE, but the thing about this is, this evolved into, "Wow, if you're doing this anyway, what if we could take..." Those things were negotiated and talked to in principle with the policy people. If you've got all that, and you're evidencing these things, and we tell these are the things we want you to evidence, why not start shoving them in a data store and turn those into real objective attestations?

Pro tip number one, don't use the word block chain in a bank, because there's some teams, "Who said that?" You don't have it anymore. It's basically some form of [inaudible 00:33:05] in a list. Now, if you want to go all in on block chain, I'll show you what we did. We kept it simple. The first thing we did on this project is we had to figure out how we were going to define the pipeline. We didn't want to create the millionth and one version of here's how a DevOps pipeline looks. That wasn't the goal.

What we had to do is say, "If we were going to think about an objective attestation model, what would be the best way to contextualize for our purposes, not for all of you?" How to really define what is the flow? This is what we came up with, source, build, dependency management. Those had their own loop on their own. The package was interesting too because as we get in the world now in most cases, everything is a package. It's the job file, it's a container image, or it's some type of AMI or instance that you're going to use on a cloud.

We realized that's a good one to have, attestations around just packaging. You don't normally see that as a separate stage. Then, you have non-prod prod. That's what we did. In fact, this took the longest time actually. All these experts and brilliant people in the room, and it literally took us a half day just to agree what would be the right model. The idea was that in the process then, we would define particular attestations that would then be hashed and sent into an attestation recorder, and then have some immutable record.

By the way, at that point, this whole thing started for me because I did a DevSecOps presentation at JFrog awhile back, and Kip Mirka who was actually at Google and then he went to JFrog, he comes to me after my presentation and goes, "Have you heard of Grafeas?" I'm "Yes, I don't know. What does it do?" He said, "This would be perfect for what you're talking about." I've got to thank him because I started down this path.

It was an open source project from Google. It's really a reference architecture. It was really simple for us, because it allowed you to put shot at the stations. It's like a crud model for metadata. It was really defined to do a lot more. For our purposes, it was really nice. In the prototype, we took the idea of one simple Java microservices, get client name or something like that, in a container that we would try to put in Kubernetes and use Kritis, which is a real simple tool that works well with Grafeas.

We'd fill out these things and it'd either get in – we were just trying to prove, does this make sense? Literally, we started on Monday "This might be the craziest idea, and in a day and half we're going to walk out and get mad at John. Why did you waste our time?" We really didn't know if we were actually going to be able to get a working model. The good news is we did. It's great to have UML people on the team. We came up with this model, which is the basic guideline for each stage. You have your risk controls, you have an input/output, actions and actors. We built the model. That book that I pointed to is "Creative Commons." You can download it at IT Revolution.

At the end of the day, it was really cool. If you read all the attestations in the book, no one customer will do this. It was a kitchen sink of Marriott, and everybody was throwing in their what we did. Actually, Microsoft was the other one. Sam Guggenheimer at Microsoft was on the project too. "Is your DevOps? Have you heard of it?" He was behind it.

You look at what we went through, and the risk. The real cool thing is the controls were the attestations. Peer reviewed, there had to be some percentage of unit coverage, unit test coverage, clean dependency, scan for sensitive information, static code, input was request for change, output was new version, the actor's actions. Then, it went to the build stage. You go through the risk. The controls were build configuration in source control. It had to come from that. No other way of course. Immutable build on output, upstream approved dependency store. There was unit test, there was linting, and there was static security analysis.

With six companies putting all there's in, you wouldn't have probably start with 75 attestations. Think about a cherry pick of those being immutable shars. Imagine one, the build log. You tar the build log. You shar it and you put it in the chain. You tar the commit log. You shar it and put it in a chain. This is the reason why when an auditor comes in, "Let me see, here's the token." It's immutable, it's math, and if it's broken, everything else in the bag breaks, because RSA encryption is dead. Quantum supremacy has won. Of course not.

That's the point is, we've talked to auditors about this. Jean Kim has run these projects at enterprise where he's brought in the big five, and we've had discussions. They like this idea. At first I thought they were going to hate this. They don't want to be in your place for 30 days. They'll charge you the same amount of money if they were there half a day. They love this idea. We went through each one of these. The dependency management stage. This is all in the book. Download only from approved sources. License checking. Security check and quality.

Aging – what one of the companies had was their Carmen of the aging of how long it was in repository. Brilliant. By the way, in the "Creative Commons" book, it's a course, because every one of these is described by these experts. It isn't just a blog. Every one of these has a little paragraph of each one of them. I mean, you could hand this to a junior person and say, "Read this and come back." "You just came out of college. You work for me. Read this first, come back, let's have a conversation." I mean, just that alone is incredible.

The only credit I get is for getting all the people in the room. The package only goes to a pipeline or a trusted dependency. Had to have a vulnerability scanning. Depending on if it's a container, was it digitally signed? Notary or something like that, unique versioning. Metadata. I don't want to look like I'm an expert on Netflix, but they had a real cool system where they launched an AMI and a lot got filled out in a metadata, you populate everything at the runtime, or, a lot of the configuration, the Zookeeper.

One of the companies that did this project did the same thing with containers where they built a everything-you-need containers with their own libraries and all that, and they really populate 60% of it at runtime from something like a Zookeeper or a metadata server.

In their world, they had to make sure that that was an attestation that the metadata was from. Somewhere in here is about secrets. Are you using Vault or something like that? You can think of all these things that you'd want. In one case, we didn't go as far. We had a two and a half day, but we just used Kritis to say, "If anything was broke in the chain, you can't get into Kubernetes from the policy administrator." The idea is, you could actually break the build if you did a more mature version. I want these people to continue to do this, and I want a lot of people to join me.

We tried this reference architecture in three different ways. One was the Grafeas. Another was Higea and then one was a homegrown system. We actually verified that it actually works on three different systems. What I'd like to do is say, "Bring your system and let's verify it."

Again, trusted metadata. Nobody's forging the metadata. Who's populating? Vault's easy, because it's pretty straight, but if you're putting sensitive data or operational data in a running container, you want to make sure that the metadata server that's driving that is secure, authorized, has all the things I need. Artifacts stage, only allow from upstream trusted package, you see that all there. It's immutable at this point, has a retention policy. Then, I didn't do the non-prod. The non-prod is definitely different than the prod, but only trusted source artifacts, allowed configuration production, encryption, secrets, tokenization. This is where the metal meets the road.

Security, intrusion detectors, identity threats, testing, promotion, quality gates. Is it operationalized? Does it have app dynamics or something in dynamic, or stuff built in so it can expose data? Drift management, change order, production access. Is there a break glass type scenario set up? Again, we weren't saying anybody should do all 75 of these, but the point was these are five really big companies running a really big infrastructure in production systems. I mean billions of dollars' worth of revenue through Kubernetes. One company in particular, $60 billion dollars' worth of producing facing revenue through their Kubernetes.

Now, I'm not going to make fun of Cubecom, but when I hear kids standing on stage talking about how resilient their version of Cubecom, I think of this one company that runs 60, and it's supposed to hit 100 billion by the end of the year of their company's revenue through their production facing Kubernetes system. Some of their attestations are in here.

Creating Trust in the Deployment Pipeline

There were some other slides I wanted to add, they're actually in the book. One of the guys, John Restaska at PNC, he was my first victim in this "Do you want to do this?" He was "Yes." At first he was "I don't get it," and then he went nuts. John is like this guy that, I say that he Kafka's everything. I suspect if he has an automated coffee pot at home, he somehow figured how to use Kafka with it.

We finished this thing in the three days, and then he actually went back and started to try and really implement his own version of this. This was reference architecture. We actually got all that captured back in the book. The first thing he said is, "I need a guaranteed delivery mechanism." We actually have a working prototype described, and not only is the attestation with Grafeas and then encrypted, but also, he's built a Kafka guaranteed delivery model around it.

Marriott documented their version. There's, again, multiple versions. This thing's got legs. One of the things I like is, I'll attribute it back to Jfrog, they're really nice to me. I like their culture, I love their CEO. I don't make any money for saying this, but the one thing I like is, there's a book called "Liquid Software," and Baruch is this unbelievable developer advocate over there. When it first came out, he wanted a quote from me and he gave it to me. I've got my day job, I've got my family and then I've got to read books for friends.

The thing is, and a lot of books kind of suck. I'll just be honest with you. I thought "Liquid Software," this is going to have suck. Sorry, I love you, Bruce. I kept my promising him. "Damn, I promise I'll read it next time." I read it and I loved it. It's very metaphoric, but it's a brilliant beautiful book about this world that we're going into. You're not going to learn how to do Kubernetes, but they do show a little about Grafeas and attestations. They did document that. I just thought it's a quick read and I think it sets a stage for not only everything is liquid, but how are we going to run what I'm talking about, stuff like automated governance, if you saw Damon's [Edwards] presentation earlier, what he was talking about. I just think this book sets a nice metaphor on liquid being the metaphor of how we're going to do software.

 

See more presentations with transcripts

 

Recorded at:

Mar 09, 2020

BT