InfoQ Homepage Presentations Reflecting on a Life Watching Movies and a Career in Security

Reflecting on a Life Watching Movies and a Career in Security

View Presentation

Speed:

Download

39:49

Summary

Jason Chan talks about some trends in the movie industry that relate well to similar changes in technology and security. He also runs through some tips and lessons learned to help security teams stay ahead as they navigate technical and operational changes.

Bio

Jason Chan leads the teams at Netflix responsible for corporate infosec, product and application security, privacy engineering, studio information security, security operations, infrastructure security, detection engineering, threat intelligence, and incident response.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Chan: We will talk about how security teams adapt to all the changes that are happening in the software engineering and software operations world. Frankly, that sounds like a really boring and dry topic. I'm going to try to make it entertaining by looking at it through the lens of movies and through entertainment, because that's something that's near and dear to my heart. I personally love movies. I do work at Netflix. We produce quite a lot of movies. I'm first going to ask you to get in our time machine and let's travel back to the 1980s. I'm a child of the '80s. I remember when the first video rental store opened up in my town. I grew up in Upstate New York. It was really neat because you'd go in. You wander the aisles. Every few weeks or so my parents would let us actually rent something, make popcorn and have a nice movie night at home. You don't really see a lot of these movie rental stores around anymore, do you? I don't.

This one is not my childhood store. This is in Eugene, Oregon, a place called Flicks & Pics. I used to live right down the street from it. It also closed. It's interesting because this entire industry, the video rental store industry basically came to life and died off in the span of my lifetime, just really a few decades. It sounds like people are still watching movies. I know I do. Of course, we just do it in a different way. This is a bit of a dated stat. There was a day last year in January 2018, where people watched about 350 million hours of video in one day. That's a lot of content. People clearly still like great storytelling. They like movies. They like to be entertained. We just do it in a different way.

It sounds obvious. We're sitting here at a technology conference. We're in San Francisco, in the middle of Silicon Valley. It's 2019. It seems incredibly obvious that people would choose streaming over a video rental store. Why would I get in my car, drive to the store, pick something out, bring it home, watch it. Bring it back before I get a late fee. When I could just, in my pajamas, sit on the couch and just hit play. It feels like an objectively better experience. Hindsight is always 20/20. It seems obvious. There was really a set of trends that happened over a few decades that really brought forth an environment where streaming video could become a dominant mode of entertainment.

Secular Trends

I want to talk about these trends. We're going to talk about what we refer to as secular trends. A secular trend is a long-term directional trend. It's non-seasonal and non-cyclical. A good example is the automobile. After the car was invented, it pretty much was just up into the right in terms of adoption. People just buy more cars. Compare that to a seasonal trend. This is a good season if you think about, at least in the United States, eggnog. People don't drink a lot of eggnog unless it's a holiday. It's pretty flat most of the year. Then you see a spike around the holidays. Goes back down, and just repeats year-over-year. That's seasonal. Then secular is just one direction. The key to thinking about secular trends is that they have a pretty massive impact for incumbents and competitors in the space. Just going back to the car example, think about what happened to the horse and buggy industry after cars were invented. They also moved in one direction. It was a different direction.

Reliable Internet

First off, to be able to stream video on the internet, you need the internet. That makes sense. What we're talking about here is reasonably priced, high bandwidth, reliable broadband to the home. Certainly, over the last few decades, that's really increased. If you're going to be sitting on the couch and you want to buy a movie, or rent a movie, or subscribe to a service like Netflix. You need to have some way to pay for it, so e-commerce. I don't think anyone would doubt that e-commerce has exploded over the last couple decades. There's so many different ways to pay. Mobile is an interesting one. People like to watch content at home, but more you see folks watching content on their mobile phones. I'm sure probably most of the folks in this room have some smartphone capable of watching content, carrying around with them.

Then last but not least, is this idea of a subscription economy, or what you might more generally call access versus ownership. Basically, the idea is if you have access to something, you don't really feel the need to own it. For me, my best example would be music. I moved last year. I had probably about 500 or 600 CDs. I got rid of them during the move because I have Spotify. I have most of the music I need. I have felt no need to actually own the physical media. You put all these things together and then you have an environment where it makes sense for streaming video to be successful and to have these video rental stores go away.

Security Development Lifecycle Era

You're asking, what does this have to do with security? I know that QCon is well known for having the most intelligent conference attendees of any conference. I'm sure many folks have already made a connection. It is very similar. Security has seen a number of these secular trends in the technology industry that is really forcing us to change. We're going to get in another time travel machine, and we're going to go to a different era. This year is the late '90s, the early 2000s. At that time, I was a security consultant. I spent about three years doing security consulting for Microsoft. Does anybody remember late '90s, early 2000s what the state of security in Microsoft software was? It wasn't great. There were really serious security issues, so Nimbda, Code Red, SQL Slammer, very serious security issues. It actually got to the point where in 2002, Bill Gates, who was leading the company at the time, actually wrote one of the most famous memos in security history, what's now known as the trustworthy computing memo. In that memo, he said, "As of now, Microsoft's top priority is security." Working in the industry and working for Microsoft at the time, it was really pretty inspiring. Because we felt, "Maybe there's a light at the end of the tunnel. Maybe we're going to get past this era of vulnerabilities in software." Of course, that was 20 years ago. I think we still have vulnerabilities in software. There was a lot of energy and a lot of resource put behind guidance and tools to make software more secure. I refer to this era as the SDL era of security. SDL is an acronym for Security Development Lifecycle. It's really a very seminal book in software security. It generally refers to this overall approach of how we were thinking about building secure software, again, in the late '90s, early 2000s.

Then things started to change in terms of how we build software. The first one that I would note is just the amount of software that's out there today, if you think about IoT, Internet of Things. How many IP addresses do you have in your house today versus 10 years ago, or 15 years ago? I'm not a big IoT person. I have a Fitbit scale. I have a Nest thermostat. I have a smartwatch. Think about how much software is in a Tesla. It's pretty amazing just the sheer amount, the volume of software that we're dealing with.

Software Delivery

Then, of course, the way that we deliver software. When I was working at Microsoft, we did security assessments on products like SQL Server and Microsoft Exchange, different operating systems. There was a time when software vendors actually put their software on these round things, called CDs or DVDs, and actually shipped them to their customers. The customers had to put them into a computer they owned in a data center that they also manage. That seems really unusual now. Of course, now, most software is delivered over the internet: web applications, APIs. Companies started to realize, "I don't need to have this six month cycle of getting disks printed and shipped. I can change software pretty much every day." Then that really brought along this idea of, how can we deliver software even faster? There's this whole set, and I'm sure you've heard a bunch of it at the conference this week, all mechanisms to deliver software faster: continuous deployment, microservices, all these techniques. Again, secular trend, just growing over time. Then, of course, the cloud. If you think about the public cloud, how much easier it is to deploy software in the cloud versus procuring your own data centers and managing. Then last but not least, is open source. What I can do now is I can use an open-source operating system, open-source web server, UI framework. All I really need to worry about is the business logic I need. It really shortens the time to value.

Hourglass and Software Development

You add all these things together and you have a fundamentally different way of developing software. The comparison I like to use is I think of the hourglass as the way maybe we used to develop software and we used to think about software security. Think of the grains of sand as software. There's only so much that can go through. We are well coordinated, there's that little skinny part in the middle. That's where security teams got involved. There was only so much you could do. There was only so much you could go through, very coordinated release trains. Then I think today, it's more of an industrial sandpit. There's way more software. It's being developed, and distributed, and deployed, and operated in a highly decentralized way. Then when you say, in the SDL era, we were really building practices around the hourglass. How do we approach the industrial sandpit? That's where we get to this slide. It's intended to be a gravestone. It's basically saying traditional application security is no longer applicable. This is really all of the practices that we had spent time developing, or have been basically made unusable because they just don't match the scale and the speed of software today.

Then, what do we do about it? If we know software has changed in the way we develop and operate, what are we going to do about that? That's what I want to spend the rest of the talk going through. I want to use some examples of some of the things that we've done at Netflix. Not thinking that folks are necessarily going to directly use that because I know every organization acts a little bit differently. I want to think more about what are some questions that security teams should be asking. What are some principles that we can start organizing around to be a little bit more successful in this new world? I want to try to keep it entertaining. I want to go back and think about examples of movies to match up to some of these technical examples.

This is the first movie, anybody recognize this one? "Dazed and confused," this was pretty popular when I was growing up. It was basically about high school shenanigans and smoking a lot of pot. That's not what we're talking about now. I would say, "Dazed and Confused," imagine yourself, you're a software engineer. How does it feel to be on your first day at a new job? To me, I think "Dazed and Confused" is a good way to describe that. Because we hire people to do some software thing, maybe it's to build a mobile app or some recommendations algorithm. It doesn't really matter. We hire folks to do something, but then you have to understand a bunch more to actually be effective. There's the basics around what does the infrastructure look like? How do we do deployments? There are all the non-functional requirements like performance and reliability. You got to figure out, how do you make changes? How do you do migrations? How do you upgrades? Then of course, last but not least, is security. You've created an environment. You've hired somebody, and probably paying them pretty well to do this thing in the middle. Then you're making them figure out all this other stuff just to be effective and to be productive.

That's the first principle I would ask security teams to think about, is really, how can you reduce the cognitive load for developers? I'm not a psychologist. I'm not a learning specialist. My general thought of cognitive load is, what is the set of things you need to have in your brain to get something done? I'll ask you to pretend you don't know what a square is. What the shape of a square is. I was going to explain that to you. I could do that in a number of ways. I could show you this. I could say here's a square. Or I could show you that definition. They're both correct. You understand what a square is. One of them was probably a little bit easier to use and a little bit easier to get your brain around.

That's really what we want to focus on. There's an interface between security teams and engineers, and we want to simplify how that works. There are a couple of questions that I think are important to ask. One is, are you trying to make your engineers, security experts? It's a reasonable question. Do you want to train your engineers to become better at security? Then a separate question, which is somewhat related, do you want them to build and operate secure systems? At Netflix, we weigh very heavily to the second. I am not interested in making our general software engineers, security experts. If they're passionate about security, great, but I look at my team's job as enabling them to build and operate secure systems regardless of their knowledge of security. That's a fundamental underlying principle of how we operate.

One of the things that we think about are what are the functions that we can abstract so that general engineering teams don't need to worry about them? I want to use an example from our studio engineering team. You may know, Netflix is a streaming service but we also produce a lot of our own content. We have essentially a studio. We create things like "BoJack Horseman," which is my favorite Netflix original. That's my recommendation of the day. We also produce things like "Stranger Things," which is a really popular show that we have. The studio engineering team is really working on technology to make that better. If you're at all familiar with the production process, like TV shows and movies, it's a pretty antiquated process. It hasn't changed a lot in the last 50 or 75 years. Very paper based, it's very manual. It's not real efficient.

The studio engineering team's job is to really optimize production from what we call pitch to play. Pitch is when we get an idea. When somebody says I have an idea for a show. Play is when it's actually on the service and you can click play. As you would imagine, there's a lot of things that happen between those two words. There's scheduling, and casting, and budgeting. There's photography, and editing. There's marketing, all stuff. Their job is to really innovate on this overall production process then make it more effective through technology to not only make an individual production more effective. To make it so we can produce a very large amount of content concurrently across the world.

Netflix Studio Apps - Zuul and Wall-E

Our studio apps work in a pretty standard way. They're pretty standard line-of-business applications. The studio user could be a production assistant. They could be a creative executive, could be an accountant, something like that. They're accessing on order of several dozen different applications to do the business of creating content. What we wanted to do was we wanted to allow these folks to be more effective and more efficient by abstracting some security functionality away. We actually did it through an existing open-source project that we had that our cloud gateway team created, which is called Zuul. Zuul is our main routing gateway. Zuul has been open source for some time. If you've ever used Netflix, your traffic goes through Zuul. Our bet was that we could repurpose this to improve the security of other line-of-business applications. We did that through a separate version of Zuul that we call Wall-E. We just inject it between the user and the applications.

The way Zuul works and hence the way Wall-E works is that it has a set of pre-filters and post-filters. Pre-filters are what happens to the traffic on the way in, post-filters on the way out. We just create a bunch of functionality that we plug in there. Things like authentication, authorization, logging, setting security headers on traffic on the way out. If you're a studio engineering developer, all you need to do is publish your app with Wall-E. You don't need to worry about any of these security features. Worry on how to create a more effective production app. We'll worry about all the rest. That's really the principle. We're trying to get from point A to point B, as quickly as possible. We're trying to make things simple for our developers.

Next example movie, does any recognize that one? I have seen "Inception" a couple times. I still probably couldn't tell you what it's actually about. It's pretty confusing. The idea is there are these multiple levels of reality and dreams in the subconscious. You could potentially inject an idea into a different level and have it escape into reality. A key theme, though, is it's very blurry. When is it reality? When is it a dream?

I'm going to take a massive leap and I'm going to compare these blurred lines to something that we're seeing now in technology. That sees blurring lines between applications, and between software and infrastructure. If you're familiar with different architectural patterns, like the monolith, when you're building software in a monolith, you don't really worry about the network because all your code is running on a single system. You move to microservices. You, as a software engineer, you have to care pretty deeply about the network conditions, because all your calls are going to be network calls. Think about immutable infrastructure. When you're taking your operating system, your software, middleware, you're packaging it as a single deployment artifact, and then deploying it that way. Pretty much everything is jumbled. Then of course, probably the ultimate is infrastructure as code. Where does software end and infrastructure begin?

This is important for security teams, because back in the SDL era, we had a bit more separation. You had network security teams, and they were working on network access control, and firewalls. You had app sec teams and they were working on threat modeling and code reviews, maybe a system security team that was working on hardening the operating system. Now it's all been bundled together. Really, it's a good opportunity. It seems it could be more confusing for security teams, but it's actually this really nice opportunity to start to use that combination for leverage.

The Magic of IaaS

The example that I want to talk about is how we handle permissions in the cloud, in Amazon Web Services. Infrastructure as a Service, it's neat to deploy an app in AWS, and to run it. You really start to unlock power and velocity when you start to use the other infrastructure services that your provider gives you. A good example, pretend you have some customer-facing application, and you need to email your customer. Back in the old days, when you manage your own infrastructure, you had to do all that yourself. You had to manage an SMTP relay. You had to handle deliverability, all those things. It has nothing to do with how your app actually works. Now with things like AWS, you just call an API and you send an email. Much simpler, much easier, you can focus on the business logic. Of course, from a security perspective, this is somewhat problematic. Because to be able to interact with those services, those instances, your applications in the middle, they need to have permissions. They need to have credentials to interact with AWS. That can be a problem, because if that system gets compromised, now your attacker has access to not just that system but potentially your entire cloud environment. What we're trying to do is get the permissions that that instance have to be what we call least privilege, just the privileges that they need to do their thing.

Cloud Based Word Processor

Let's go back to day one at a job. You've just started a new job. Your manager says, "I would love you to build a cloud based word processor." I don't know why, but that's the assignment. They say, "We use AWS. Go ahead and use as much of AWS as you can to make this thing work." You're super excited because you're like, "I get to focus just on the business logic. AWS is going to take care of it for me. Just go ahead and give me the permissions for AWS and I'll go ahead and get started." We'll tell this story through a series of emojis. This emoji is a newspaper, because you might end up on the front page of the newspaper with this policy. You don't need to know how AWS permissions work. Just know that as security folks when you're looking at an authorization policy, and whenever you see asterisk or stars. That's probably not a good thing.

This particular policy basically gives you complete rights to the entire AWS environment. That's pretty problematic. You're a responsible developer and you tell your security team, "I don't need all those rights. I only need to use the storage mechanism." They give you a little bit better policy because S3 is the AWS storage system. You still have more rights than you need because you can create storage containers. You can delete them. You can change all configurations. You say, "I only need a couple of APIs." They make it a little bit better. There's still room for improvement. Then they say, "Here we go." Now this is where we need to be. This is a least privilege policy. You have only the permissions you need to the resources that you need to operate against. I know what you're thinking. You're like, "That's easy. Let's just do that for all the applications." Unfortunately, if you've ever worked in a production environment, you know that it's not that simple.

I'm going to use another little story here. This is "Goldilocks and the Three Bears." Basically, Goldilocks is wandering around in the woods. She comes along the Three Bears cabin. She goes into the cabin uninvited. She's hungry and there's some breakfast on the table. She tries one, it's too hot. She tries the other one, it's too cold. She tries the third one, it's just right. Then she just ate breakfast so she's tired. There are three bedrooms. She goes and tries one of the beds, it's too hard. She tries another bed, it's too soft. She tries the third, it's perfect. In this case, Goldilocks is the developer. She gets what she wants, ultimately. There's a lot of collateral damage there. Put yourself in the bear's shoes. Somebody just came in your house, ate all your food. They messed up your bed.

This is how it works in the real world. Because security teams what they typically do is they say, default deny. I'm not going to give you anything, tell me what you need. The problem is, when you're a developer, and you're innovating. You're working on something. You're not quite sure. You don't know what you need. Then you say, "I need X." Then the security team gives you X. You start doing your thing. Then you run into some roadblock because you don't have access. You say, "Now I need Y." They give you Y. Do it again, now I need Z. There's this back and forth that is inefficient and is going to stifle innovation. It's also going to create a bad relationship between your developers and your security teams.

AWS Provides Data about API Use

The nice thing about AWS is that it actually tells you how your cloud is being used. They have a service called CloudTrail that basically tells you how your systems are using the cloud. Then we could use this. We've been using AWS for a bit over a decade. We run many hundreds and thousands of applications in AWS. We've observed those, and we have a reasonable sense of how general apps want to interact with the cloud. Then what we do, if this is your day one, and you're creating your cloud based word processor, we give you a base set of permissions. What you should know is that this set of permissions is a bit permissive. It's more than you're going to need. We start that way, because we think it's a good trade-off between velocity and security.

I mentioned, we have data, we can see what's being used. We just look at what your app is doing. We just observe. It doesn't really matter what you think you need. What you might need. I think I need this particular API call. It doesn't matter because we're going to see what you actually need. Then we just remove any permissions that you're not using. The nice thing here is you don't need to ask for anything. It just happens. You don't need to know what any of this is. You don't need to know what CloudTrail is. What IAM is. You don't need to know what policies are. You just do your thing. We're trying to keep the developer from having to worry about this thing. We've actually open sourced this. If you use AWS and you think that might be useful, I would encourage you to take a look at Repokid.

I like to use this picture just to describe the principles that we're trying to get to. I think this is an imagination of Hyperloop or something like that. I look at this, and I think, transparent. Because we have data, we're using data to make our decisions. It's high speed. It's low friction. That's the environment we're trying to create to facilitate high-velocity, high-scale software development.

The last movie here, you don't need to guess because I have neglected to cut out the title, but "The Purge." Has anybody seen "The Purge?" It's not going to win any Academy Awards. There are a few purges, but I like them. Basically, the idea with "The Purge" is there's one day a year where there are no laws. Go out, kill, steal, whatever you want to do for one day of the year. It's like anarchy. You're thinking, what does anarchy have to do with software engineering?

Potential for Controlled Anarchy

There is the potential for some bits of what you might call anarchy, or at least distributed governance. Even think about microservices. When you adopt a microservices architecture or pattern, you're basically saying you want teams to be able to operate independently. You want to move away from the central control. Then you have these operational patterns. You build it. You run it. Where, if you're building software, you're also running it. Instead of having a single ops team, you now have all different operational teams maybe doing things in different ways. You have polyglot. You have people using all different technologies. Because part of the promise of microservices is that, assuming you've maintained an API contract, you can iterate with whatever technology you want behind that. Then independent deployments, you're no longer doing a single deployment. Everybody's deploying when they want. What this does, it does intentionally decentralize governance because you're trying to unlock velocity. It does make things more complicated for security teams. Now instead of having that single hourglass, that single place to keep an eye on things, you've got things all over the place.

I'm going to use another comparison, another analogy to talk about this. I'm going to talk about eating. Have you gone to a tapas restaurant? This is a Spanish style of eating. I think it literally means small plates. If you've ever been to a tapas place, you usually go with people. You just hang out. You maybe order one thing. Then you have some wine or some sangria. Order a few more things. You might stay there a long time. The idea is that every time you go to a tapas restaurant, it's probably going to be a little bit different of experience. It's a very high-touch, a very custom eating experience. This is similar to how we were doing software security in the SDL world. It was very high-touch, very custom.

Then I would say compare your dining experience when you go to a tapas place, at least in the U.S. In the United States, have you ever been to a large wedding with maybe a couple hundred people? What is dinner like at that place? You don't have a lot of choice. You have chicken, fish, vegetarian. There's no substitution. You're not going to interact a lot with the waitstaff, or with the servers. That's what you need to do to be able to serve that many people all at the same time in a synchronized way. This is the approach. This is where I would say security teams need to go to. You need to think less custom, less bespoke, less tapas, and more large wedding. You got to think about scale, and you got to think about simplifying and standardizing.

Managing the Anarchy - The Security Paved Road

Let's talk about how we do that. How we manage the anarchy at Netflix. We have a concept that we use at Netflix that's called the paved road. We have a security paved road. The idea of a paved road is that we're a central engineering team. We provide a bunch of solutions for common security problems like identity and logging, stuff like that. The idea is what the security paved road does is it clarifies what our recommendations are. It helps us evangelize these patterns. What we want to be able to do is in an automated way, observe whether or not you're actually participating in these paved road solutions. Then use that as the means of interfacing with engineering teams. What it helps us do is uncover risk and reward folks that are actually doing a good thing. Say, we had 25 different paved road practices. We can measure how much adoption we have. You're a team that's only adopted two of those practices. We're probably going to spend some time talking with you, whereas if you're a team that's adopted them all, we're going to give you a pat on the back.

These are some example checks, things like AWS configurations. We want to make sure you're using our identity service. We want to make sure you're not storing secrets in code, those things. Another example I wanted to talk about, was what we call the quarterly change cycle. The quarterly change cycle is what we're asking developers to do is commit to updating your code at least once a quarter. If you do that, if you push your code once a quarter, you're going to automatically pull in all of the updates to the paved road components, all the upgrades, and library changes.

This is an example here. The quarterly change cycle has this idea of deprecations and blacklists. This is a service that my team runs called Gandalf. It's an authorization service. You can probably guess what that's saying there. Basically, what we're doing is we're publishing a deprecation. We're saying, if you're using this software version less than 0.16.0, you're going to need to update it. You don't need to update it manually. You just push your code. Then what we do is we have a burndown chart here. When we first published that deprecation, there's a high degree of noncompliance. Then as teams deploy code over the quarter, you see it going down. There's no specific interaction that we do. We just make our change available. We publish it as a deprecation. Teams automatically pick it up as they push code.

Security Brain

The other example I wanted to show is a tool that we built called Security Brain. The intent of Security Brain is to make our expectations very explicit for engineering teams that we work with, and also, to standardize the interface that we have with them. This is some screenshots from Security Brain. Basically, what you have is a per user view, or a per team view. You have all your applications in that view. It's very clear what vulnerabilities have been identified in your application. You can click straight through to Jira, and see. It also has all of the practices that we recommend, and whether or not your application is actually using them. It's very easy for you as an engineer to see, these are the things the security team wants me to do, and whether or not we've actually done them. Then it allows us as a security team to prioritize who we're going to work with. We tend to go after the folks that are fairly low on the Security Brain score.

Our intent there is that most of the asks that we have are going to be standard. We really need to be careful about how much bespoke work we do, how much custom work we do. We do a fair amount of custom work. We really want to aim that towards the teams that are building the most critical systems. For the vast majority of engineering teams that are building less security sensitive systems, we still want to give them guidance, but we need to deliver in a standard way.

That's the paved road. Really, a key there for us is to be able to gather data, to measure adoption, and to be able to use that as a standard way of interfacing with engineering teams so that we can get leverage.

Overall Takeaways

First is, we talked about secular trends. It's really important for security teams and engineering teams in general to stay attuned to what's going on. How do we need to adapt based on what we're seeing? We don't want to keep our mental models and never refresh them. Simplify, standardize, to me that's a real key to working in high-velocity environments. We want to be transparent. I think it's very important for security teams to be transparent. If I'm going to make a decision about something, then I want to make sure that I can explain why that's being made. We talked about that with the AWS permissions. We want to measure adoption and uptake so we know who to target. Then finally, we need to be comfortable with trade-offs. We talked about tapas versus a wedding. You might like tapas more, but we know to scale you're going to need to make some trade-offs.

Questions and Answers

Participant 1: In the Security Brain, you mentioned that you ask people to pick up new libraries that have patches once a quarter. Could you just do that for them? You could just go to their dependency declaration and say, upgrade this version of these libraries. It would prepare a CI/CD build, run some tests. It'd still be up to them to do the whole canary thing all the way through. Would you do something like that?

Chan: That's essentially where the team is moving. Not necessarily the security team, but the engineering tools and services team. They're moving to make it as simple as possible. You still want to give the team control over when it gets deployed. The idea is to make it as simple as possible. I think we've made good progress over the years. Although I think our deployment systems have always been pretty user friendly. They're just becoming simpler and simpler, and even evolving to what you might think of as a more managed delivery approach where you can be quite hands-off.

Participant 1: The thing you mentioned, the Repokid. Yesterday, I was talking to this person who gave a talk about Terraform. The biggest issues we have in that world is the security IAMs. People just open the policy door. I actually thought about the same problem, which is claw back. Initially, be permissive: scan, observe, and claw back. Does Repokid work with the existing tooling that is in the language of infrastructures, such as Ansible, Terraform, Packer, that suite of widely deployed build tools?

Chan: It does not. It should probably be reasonably easy to modify that. The way it works, it's a multi-stage process. Part of the process is we need to make sure every application in the environment is using its own unique identity, its own unique IAM role so that we can then accurately profile it. Then from there, what we're really trying to do is, behind the scenes, mine the data that we have. Then as appropriate, swap out those policies. It doesn't necessarily work with those tools. I imagine that it would be reasonably adaptable to that.

Moderator: If you were an organization that doesn't do any of these, a blank field, where would you start from?

Chan: Is it an organization that has adopted what you might consider modern software engineering and operation?

Moderator: Yes. They don't put a lot of attention on security.

Chan: They have a relatively low knowledge retention.

Moderator: Yes. What are your priorities there? From your experience, how do you decide where to start, to updating libraries quarterly, or to implementing some CI/CD processes?

Chan: At a high level, non-engineering specific, I would always start with making sure that the organization understands what is truly valuable in their environment, and what is the worst case scenario? Because I think security folks, you need to take a risk-based view because you can never do everything. Then speaking on the technical sides, I would probably focus on figuring out how to make sure I understand the environment. We think about asset inventory. Understanding what are you actually responsible for, and then figuring out how to make sure that stays updated, so patching or upgrades.

Moderator: DevOps, infrastructure as a code, patching all the time?

Chan: Yes. That's one of the benefits that security teams have had from continuous deployment is that if you can make changes quickly in your environment, that's a very important security feature because vulnerabilities happen. We want to be able to update quickly. Any mechanism that the organization has that allows you to push changes, or config, or new OS, those things are super beneficial and valuable to security.

See more presentations with transcripts

Recorded at:

May 27, 2020

Jason Chan

InfoQ Software Architects' Newsletter