BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations A Big Dashboard of Problems: Creating Preventative Security Strategies

A Big Dashboard of Problems: Creating Preventative Security Strategies

Bookmarks
42:47

Summary

Travis McPeak explores the forefront of simple and effective preventative security strategies.

Bio

Travis McPeak is a security leader with over a decade of experience spanning application and cloud security. He enjoys building scalable security teams that empower developers by making security automatic and easy. Travis is a founder and CEO at Resourcely and previously led teams and projects at large cloud-first companies including Netflix and Databricks.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

McPeak: Who here has heard of Security Monkey? Open sourced in 2014, Security Monkey was part of Netflix's Simian Army. I believe it was among the first Cloud Security Posture Management tools before they were even called Cloud Security Posture Management. What Security Monkey does is it scans your cloud resources, creates an inventory, and reports on misconfigurations. The tool is really useful, and had it been a for-profit company, it would probably be worth a bajillion dollars today. In the initial Simian Army post that announced Security Monkey, it actually says Security Monkey terminates offending instances. This didn't actually end up being true. We did, however, show all misconfigurations in a dashboard. One time we were meeting up with our friends from Riot Games, their team had flown up to hang out with our security team. They were hanging out at the Netflix office with us. Both of our teams were showing each other tools that we'd built to solve a problem. One of the Riot folks told us that they created something that was designed to enforce tagging, and would actually terminate instances if they weren't tagged correctly. Apparently, the system had gone awry at some point and earned the unfortunate name, Murderbot. Then, when it was our turn to present, and we showed Security Monkey, one of the Riot folks said, what am I supposed to do with a thousand findings? This is the first time I thought, the dashboards by themselves aren't very useful.

Has anybody seen Repokid? This is Netflix's open source tool that automatically right-sizes roles to least privilege. Did you know that before Repokid, there was another tool called Repoman? My former Netflix colleague, Patrick Kelley, created Repoman, which was a tool that would go look at your application roles in AWS, and show all of the findings to developers in a dashboard. The idea was that they would go to the dashboard, see a least privilege change that they could make, and click a button to right-size the role. What was the problem? Nobody used it. We learned a few things. One, developers don't really care about least privilege. This is a security problem. The best you can do is make it automatic for developers to get least privilege. Two, nobody wants to go to a dashboard to see problems. We need to actually do something about it. Repoman became Repokid. Repokid was my first project at Netflix. It did exactly what Repoman does, except for with one big change. Rather than show issues in a dashboard with a button to fix, Repokid made the default, least privilege, and a lot of developers to opt out. At least Repoman tried to solve problems with a button though.

One of the ways I like to spend time is with advising and angel investing. In order to do this well, I have to look at a lot of companies. I've probably seen about 200 pitches in the last 4 years. I'm regularly shocked at how many of these pitches are just giant dashboards of problems. As an industry, we're so busy, there's so much work to do. We literally can't pay people enough to come and do it. There's three-and-a-half million open cybersecurity jobs in 2021. Who here works in security? Yes, my people. Who has extra time that they're looking to fill with some random new fun projects? Yes, me neither. I see dashboards like this and I get so frustrated. What am I supposed to do? There's 120 vulnerabilities and eight-and-a-half vulnerabilities per host. What am I supposed to do with this information? Who is going to come to this dashboard, then do something new and useful as a result? Look at this one. This is just some internet image that I grabbed, some random thing. This has 76,000 vulnerabilities awaiting attention. If I see this, I would go hide under my desk, or retire, or something. Things are not working. Also, I love this line at the bottom here that shows that it's going to be 32 days until it closes. If I look at this dashboard, there's no way I'm thinking that this is going to zero. How about this? Nine hundred and seven total assets and 857 non-compliant assets. A pro tip, if I see 1.517k of anything, it's not actionable. Also, a side note, what are pie charts for? Is this for people that don't really get percentages? Cyber, what am I supposed to do though? Again, the internet's a bad place, it's always under attack. What do I do, just disconnect from the internet here? Today is the day. It's shields up. This is the day we all start taking security seriously, shields up day. Here's a tip for vendors. If somebody wants to bulk archive all of your findings, the product's not working.

Background

My name is Travis. I've spent most of my career leading some aspect of security at large companies. One thing I love about security, particularly at large companies is there's a ton of strategy involved. You have very finite resources and security at the end of the day is a cost center. We're supposed to prevent bad things from happening in the future. Something about humans is that we're actually really bad at estimating future risk. Unsurprisingly, we in security have to fight for every dollar we get. The job in security is to mitigate risk. Let me ask you, what risk does me having a shiny new dashboard like those that I just showed you actually mitigate? Unless the dashboard is just eye candy for the CISO, or to make other executives think that you're doing work, it's part of a solution, and not even the most useful part.

What's Wrong with Dashboards?

In defensive dashboards, I understand that visibility is the first requirement of security and you can't fix what you can't see. Unless your solution is specifically an inventory solution, or you're finding something with a very simple fix, I think we need to do better. Many products are focused on identify and detect, but these are at best an incomplete solution. If your products can't protect and respond, it's probably not that useful. Even for simple dashboards, there's too much noise. If the product is telling me about something, it needs to really be important. We talked earlier about Cloud Security Posture Management, and there were two waves of them. There was what I would call v1 products, which is like Security Monkey, Evident, RedLock. Then there's v2, which is like Orca and Wiz. These are specifically for filtering. The big thing that they added is they take a bunch of context about your environment, and they tell you things that are really important versus everything. It'll be the difference between your thing is internet facing versus your thing is internet facing, it has five critical vulns, and it hasn't been touched in three years.

What's actually wrong with dashboards here? A few things. Unsolvable problems. I don't want to beat up on GuardDuty too much. I think it's actually a really useful product for smaller companies that need to do something for security. When I was working at Netflix, they would regularly tell me about internet facing things that had incoming connections from malicious IPs. What do I do with that? We can't block the IP. At Netflix, we actually had this way where we would block IPs from time to time, and you know what would happen? Somebody at the college does something bad on the internet, and then we end up blocking the entire college IP block. Nobody at the college can get on the internet. The problem is that attackers can change their IPs but normal users can't.

Unactionable findings. If there's more than a thousand of something, I can't realistically take an action. I either need to do more triaging or filtering, or use this as some signal to go fix the problem upstream. In any case, the product telling me about so much stuff at one time, is not a solution for me. Unimportant findings. Why does the info category even exist? Realistically, everybody in the industry ignores lows in most mediums. If you're going to be a dashboard, show me things that have a high likelihood of me actually doing something and then help me take the action that I'm trying to do. Finally, the sky is falling. I get it, security is really hard. Showing me a ton of problems that there's no solution for, I'd rather actually not see it. I'll focus my attention somewhere where I can actually make a difference. We need to at least have an easy button, a runbook, something to do with the findings.

Hierarchy of Security Products

The real problem with these is that by the time a scanner finds an issue, it's too late. Basically, as an industry, vuln management is really hard. More than 60% of breaches involve some unknown vulnerability that you probably have sitting in a dashboard somewhere. Look at how much time as an industry we spend on vuln management, 443 hours per week on average, and more than 23,000 in a year. It's just a ton of time doing this find and fix stuff. I call this the pyramid of crap. This is my proposed hierarchy for security products. At the bottom, we of course have dashboards of problems. Hopefully you all agree with me that these are crap, and we can move on. One step above dashboards, we have dashboards with an easy fix option. Technically, Repoman would fall into this category. Some of the new Cloud Security Posture Management tools also do this. Moving up the pyramid, we have tools that will go and fix issues and then report on success. One example here is Repokid. We would go do the automatic least privilege and then tell developers when it was done. One step up from that, we can do continuously fix. This is better because what it does is shrink the vulnerability window. One company told me that they have this set of lambdas that they run that automatically puts non-compliant security resources back into the right state. I think this is the best level that we can achieve, as an industry, if we're not willing to change developer behavior at all. Some of the newer security solutions are advocating shift left. What this is, is basically like catch a problem in the CI or test environment so you're never actually vulnerable in prod. There's plenty of examples of this. Anything you integrate into your CI is prime.

These solutions are pretty good, because you don't actually have the problem, but there's yet one better option, which is the mystery. Throw the computer into the sea. As much as we all want to do this in security, you know that we can't be secure without availability, so not an option. Defaults is the option that we're looking for. It's so much more effective than everything else. Why is that? There's something called tyranny of the default. Basically, people just don't change defaults. There's a study from Microsoft that found 95% of Word users kept the defaults that were preloaded. In fact, there's an entire branch of economics called behavioral economics that uses defaults as a powerful tool. In a paper, the Australian government called, "Harnessing the Power of Defaults," the authors describe why defaults work and how to use them.

There's basically three main categories of why defaults work: transaction barriers, behavioral biases, and preference formation. Transaction barriers basically means some combination of actual pain, or you think it's a pain to change the settings, and so you don't do it. There's behavioral biases. There's actually a few kinds. One is loss aversion, which basically says people are wired to avoid losing things that they consider valuable. Your brain considers whatever settings that you already have in this bucket. There's discounting, which means basically, I'm really bad at estimating the benefit of something in the future and pain right now feels bad. People generally put a discount to whatever happens in the future. I have to do something now, I get something in the future, people don't usually do it. Then, procrastination. There's some cognitive load. You don't want to do something. It feels inconvenient to do at the moment. People just maintain the status quo. Finally, preference formation. This actually has two parts. There's implicit advice, which means that the defaults are seen as suggestion by an expert. You're like, I better trust the expert. Experience, basically, when you stay in a certain state for too long, then your brain develops a preference for it.

There's a few examples of this in non-security that are pretty interesting. One is organ donation. Most countries want to get more people to donate organs, and you can adopt the approach of either opt in or opt out. Countries have done both. For countries where it's opt out, 90% of people become organ donors. For countries where it's opt in, they struggle to get to 15%. Massive difference here in those two. In one study, some researchers tried to get a way for people to use more generic drugs. The idea is that generic drugs are better than prescription. The more that you can nudge people to generics, the better for a country. What researchers did is they changed the software that doctors used to prescribe to either a default, prescribe generic, or a default, prescribe brand name. When it was generic, the generics were 98.4%. When it was the brand name, the generics were 23.1%. Again, big difference. Then, of course, salespeople know the value of defaults firsthand. This is why most subscriptions helpfully renew for you. Netflix. How do we use this? You won't believe this one simple trick that can make you more secure than all your friends' companies.

Secure by Default- Application Security

The rest of this talk is all about defaults. We're going to talk about different approaches and products and things that I believe have gotten this right. First category is security of applications. This first one is an example. This is an open source project that Segment releases. Specifically, there's a component called ui-box that implements a ui-box. It's really used for setting up buttons and things like that. There's a property that gets attached to all buttons that makes it so that you don't have to check the safety of the link of the thing that the button goes to. It'll only allow safe destinations, which are things that you would expect from a link versus things like JavaScript exploit code. Segment built this because they kept getting hidden bug bounty reports with JavaScript hrefs, and they didn't want to keep playing Whack a Mole. In the first version, it was opt in, and then later it became the default. Then, similarly, in the web framework, Angular, it requires you to explicitly add an unsafe in front of the protocol for something that isn't allowedlisted. This pattern of calling something unsafe to really help developers understand what it is, is something that we see a lot, and secure by default is a really useful pattern.

Cipher suites go all the way back to OpenSSL. The cipher suite lists the cryptographic algorithms that are used to exchange keys, encrypt the connection, verify the certificates. A lot of servers leave this choice of which algorithms to support to the developer. That's big cognitive load. Without being an expert in cryptography, it's really hard to know what to support. This becomes really important because man in the middle attackers can force a cipher suite downgrade if the server supports bad options. This is practically a real issue that developers are supposed to have to care about and really don't have good information. Beginning in Go 1.17, Go takes over the cipher ordering for all Go users. You can still disable suites individually, but ordering is not developer controlled. This crypto TLS library takes care of all of that for you based on local hardware, remote capabilities, available cipher suites, like all of the things that you would actually use if you were an expert in this. It's really nice that Go just handles it for us.

Next up is Tink. This is a project with a self-described goal of, "making crypto not feel like juggling chainsaws in the dark." That really resonates with me. This is a valuable project. Even as a security person, every time I deal with crypto, I get a little bit nervous. I just think to myself, I really don't want to screw this up. I'm a security person, I'm going to look so dumb. Also, of course, I don't know all the context and history that I need to make a decision. I could spend several hours researching it and probably get to the right answer, but I might still make a mistake. Instead, what I can use is Google's Tink open source, which makes it, "easy to use correctly and harder to misuse." This is another example of a library that comes with safe defaults baked in and prevents me from chopping myself up accidentally.

Rails CSRF prevention does exactly what it advertises on a label, it eliminates a major class of web vulnerability. Similar to Segment's feature, it was shipped as an option at first and then later became the default. I look forward to a world where developers don't even have to think about any of these web application attacks anymore, and don't even need to know what CSRF is. Learning about this and having to understand it and think about when it's happening really distracts developers and other folks from the work that we want them to be doing. I've seen a ton of applications where they just pick some default password and they expect that the users are going to go and change it, and they don't. It's usually something like change me. At one previous employer, there was a major bug bounty submission about this. Basically, just the fail to change, change me password bug. Passwords are bad. I would love for us to just get away from them completely. Until that day comes, why don't we use strong pseudo random passwords for everything? The best way to guarantee this is simply remove that choice from the user, just generate something really good and just deliver it to them. Or if they're going to pick a password, make it so they have to jump through a lot of hoops to do that. This is similar to Tink's philosophy of making it really hard to screw up crypto. This is an oldie but a goodie. When you use an ORM, you get a ton of technical benefits, but you also make it much harder to write raw SQL injection vulnerabilities. As we move away from raw SQL, there's nothing really to inject into, so easy-peasy, win-win.

Secure By Default - Architecture

Next step, secure by default architecture. More than half of breaches involve some vulnerability that was unpatched. What can we do to make more vulnerability secure by default? One solution is making patching less effortful. Back when I worked at Netflix, they invested a ton in guiding developers to do more cattle and less pets. Cattle is basically replaceable. If one server goes down, you rotate it out, you put in an exactly identical one. Pets are those servers that we all keep updating, and we're afraid that something might happen to it someday. Generally, we want to encourage more cattle and less pets. One of the ways that we did this was with immutable infrastructure. What this means is that if you want to change your system, you rebuild a new image, and then you redeploy it, versus having to change the software on the instance. This approach itself carries a ton of benefits. For example, if something happens to an instance, your automation can easily bring up another one. At Netflix, we would encourage this practice with a tool called Chaos Monkey. What Chaos Monkey will do is it'll just go randomly do something to your instance that makes it unstable, and tests your ability to recover from it automatically. If you get to the point where rebuilding and redeploying is automated, and you invest in testing and telemetry to tell you when your app is unhealthy, you can lean into auto-patching. The idea is that you constantly redeploy images with the latest software. If something goes wrong, your orchestration routes traffic to a previous version. You can easily fail back to wherever you were. This makes it really cheap and easy to try and test and see if a new instance works or not.

Taking this one step further, Netflix has a system called Managed Delivery, and it just offloads from the application developers to some platform that can perform updates asynchronously on your behalf. In fact, Netflix invested so much here that they were able to patch many of the Log4j instances in 10 minutes versus "weeks or more than a month" according to ISC2 data. Assuming an organization spent one week, Netflix's 10 minutes would be over a thousand times faster. An alternative approach is simply need to patch less. One way to accomplish this is with distroless distributions. Many folks treat containers essentially as virtual machines with their own operating system images. A better way to use containers is to use the host operating system and only bundle your application as direct dependencies. This approach if you do it, if you think about it, it's going to lead to less overall patching. Smaller surface area, less patching, overall, because you're not including things that you don't need and having to patch those. Then another example of this is serverless, such as AWS Lambda. This removes the need to do anything with underlying operating systems. Your application simply gets a runtime on top of somebody else's host and you only bundle the app and its immediate dependencies. That's all you're responsible for patching.

The secure by default version of ACLs is least privilege, basically. An example of this is Repokid. The way this would work is we would set deliberately broad IAM roles for default on a new application. You spin up an app, you get a role, your role has x actions. These are things that we found most of the time apps need to do in some way or another. We know that we're deliberately overprivileged here. We're giving you more than you for sure need. What we do is we observe with data, what your application is doing over time. Over time, we learn what your normal behavior is. After some period, call it 3 months, we can remove all of the permissions that you've never actually used in that time period. When I say remove, we actually will rewrite the role policy to include only the permissions that you're actively using. Before we do this, of course, we'll tell developers like, your app is going to change, but most of the time this is safe, you can opt out if you want. Most don't. Then we get least privilege. We actually converge to perfect least privilege over time. I want to note though, this isn't secure by default, because the application is vulnerable for a period of time but it is automatic security. Another tactic that Netflix uses is an empty role. Most workloads don't actually require any IAM permissions. The default launching for a lot of systems was empty roles. For this to be usable, of course, we need to make it easy for developers to go and get new permissions they needed, so we invest a lot in self-service. We can take a similar approach for security groups. I really like how AWS had default empty security groups. If something needs to talk to your app, you explicitly add it. You can take this same approach for egress. If your application needs to talk to something on the network, then you explicitly add it. Then in Kubernetes lens, we should launch no privilege containers, no host network, and then force containers to run without root.

I'm a huge fan of systems that developers really want to use, but also have awesome security baked in without having to think about it at all. One case here is Spinnaker. Spinnaker has a ton of auxiliary security benefits, like making it really easy to deploy your application for patching. Since so many developers prefer to use it, now we also as a security team have a nice injection point for secure defaults. In Spinnaker, each application launched with its own app specific role by default. The roles made it possible to do repoing. Without them, we wouldn't be able to effectively repo. Spinnaker would make it really hard to launch instances without using the golden image, which is also a good thing. We want folks using that, then you can do central patching in one place. It also tracks properties that we care about, like who owns an application. Another example is Lemur. Without Lemur, if a developer wants a certificate for their microservice, they have to select a cipher suite, generate a private key, generate a certificate, get the certificate to the load balancer, and handle rotation. With Lemur, we replace all of that with a few button clicks. Now we get secure by default crypto algorithms, strong key storage, and an inventory. Finally, there's Zuul, and Zuul's internally facing sister, Wall-E. Both of these services have really nice security properties baked in that developers just got for free. They didn't have to worry about it at all.

Consumer Security

Next up, Time's Person of the Year, you. Let's talk about what do we do for consumer security. Something interesting that I found out is that cars and car security has come a long way. Early cars actually didn't have anything built in at all to prevent theft. Then in the 1940s, carmakers started adding locks to make sure that people didn't steal your car or the stuff inside. Then fast forward 58 years to 1998, that's when most carmakers started introducing central locking systems. Before that you had to go door to door and unlock or lock each door. Then finally, in the late '90s and early 2000s, key fobs were introduced. Key fobs are a game changer. These were default, 62% of cars by 2018. Fobs are really cool because it makes it really easy to unlock your door, and it makes it hard to lock your key in your car, so you get a double win here. Similarly, the Apple Watch can automatically unlock your machine, which makes it easy to set an aggressive password policy. The workflow is basically, you walk away, it basically auto-locks. You walk back up, it unlocks. Life is good. I think this GIF speaks for itself. This thing will prevent you from you guys chopping off your hot dog. This is a really useful feature for people that do woodworking. This thing makes your fingers secure by default, you can't chop it off.

Chromebook automatically updates without users having to do anything, and it also has a really nice secure boot process with validation so every time the computer spins up, it's making sure that it's running untampered Chrome. This makes it really cool for my family, so they don't have to worry about viruses and scary stuff like that on the internet. Chrome browser has a ton of security features built in. It has a really solid password manager. It can automatically upgrade all your connection stage DPS, and it tries to use secure DNS to resolve sites. Chrome also makes it really clear when you're about to visit something sketchy. A special shoutout to Adrienne Porter Felt who did a ton of research and work making Chrome secure by default, and highly usable.

WebAuthn has a ton of secure by default properties. I think most of us can agree, passwords aren't aging well. All of us still have that loved one that uses the same terrible password for every site. I highly recommend a read of this link, https://webauthn.guide/. According to the site, 81% of hacking related breaches use weak or stolen passwords. Developers have to figure out how to store and manage passwords securely, which is a big burden for them. Users have to figure out how to store and manage passwords, which is a big burden for them. This is the stuff that makes my family afraid to use the internet. With WebAuthn, we store passwords in HSMs, which are systems purpose built to keep secrets safe. Users don't have to deal with keys at all. Developers only store a public key, which is deliberately public so if it gets disclosed, it's worthless to an attacker. Probably the biggest game changer here though, is that your private key can only be used to authenticate to the site that it's scoped for. This mitigates the risk of phishing. Phishing is so bad that it basically dominates Verizon's DBIR report, which is an annual report about how breaches happen. It dominates that report. Overwhelmingly, phishing is like the lead to everything bad on the internet. Folks that actually cite DBIR stuff often start by filtering out the phishing results. That's how bad it is.

Here's a cool example. AirPods have a feature that warns when you're about to leave them somewhere. This is really good because these things fall out of my pocket all the time. Both Office 365 and GSuite which are two of the biggest hosted email providers, have malware and phishing prevention built in and on by default. This makes it way safer for people to use the internet. Extending this one step further is one of my favorite vendors, Material Security. Material noticed that attackers often compromise email and then dig out sensitive information to use in future attacks. What Material will do is automatically quarantine the info and require a second factor to get to it. This way, you still have the info you need in your email, but attackers can't easily use it against you.

What's the Point?

What's the point? I want to move the industry up the pyramid. Visibility of security issues shouldn't be a viable product. Let's move up the pyramid and have more Murderbot. If your product requires me to throw a bunch of ops at it, I'm not going to buy it. Defaults are powerful. Let's make it really hard to do the wrong thing. You should have to go way out of your way to hurt yourself in security. Users have it hard, let's do everything we can to make their lives easier. Finally, an ounce of prevention is worth a pound of cure.

Takeaways for Devs, and Recommendations

I know you all are developers, so what would I tell you? First of all, you're responsible for the security of your application. It's not the security team. They will help you do it, but you're responsible if your application has problems. It's security's job to make that really easy for you. We should guide users to make safe choices by default, and then let them opt out if they want to do something unsafe, versus the other way around, where you actually have to explicitly set up security features. I would suggest, walk through the setup of whatever product that you're dealing with, whatever you own with your users, and observe their struggles. One thing that I saw at a company one time, we actually wrote down the steps that it takes for someone to launch an app. It was over 30 pages long when we documented it. By just observing the pain that your users have to go through, you can see a lot of these rough edges and potential for defaults. If there's a clear best practice, just make it the default. Then finally, some recommendations. Chromebooks all the way for your family. WebAuthn if you're dealing with auth. Golang if you're writing a new application. It's got a ton of nice libraries and defaults and things baked into it. Privilege lists for containers, firewalls, and RBAC.

Questions and Answers

Knecht: What stops people today from being able to move beyond the dashboard of problems?

McPeak: Actually, just because of what I do for my day job, I've learned a lot about this. I do something that's beyond a dashboard of problems. I think security has a traditional mindset of we don't want to go and screw with engineers too badly. We want to just buy something, show risk or whatever, and then call it a day. When security does interact with engineers, a lot of times it's through Jira. Security is requesting you to do some item, they make a Jira ticket for you, they'll nag you to do it. That's it. They don't really get too involved in the flow of engineers. This is speculation on my part, but I think part of it is because a lot of times security doesn't understand engineering that well, and they're afraid to go and mess with stuff. It's a political cattle-driven organization, as all central teams are, so they really just don't want to impact their customers. They're a little bit concerned to go and change things. That's why I think it happens.

Knecht: You said security doesn't understand engineering that well. What's one of the things that folks can do to break down that barrier, if that's something that people are facing either in their job, or as they're approaching solving security problems at work?

McPeak: I think the most high-impact thing that security can do is go out and partner with engineers. Go have lunch with them, make friends with them, understand what it is they're working on. Build that dialogue and trust, so that, first of all, you understand what it is they're doing and how they're doing it. I also think that there's a ton of benefit in hiring engineers into the security team. Like I said, in the presentation, we're short three-and-a-half million jobs. I see a lot of engineers are quite interested in security. I think you can do it a few ways. You can do a rotation system, where somebody comes in, does security for a year, year-and-a-half, or whatever, and they get a skill and you get engineering exposure. I met somebody at re:Invent that is an engineer and is really interested in moving into security, at least it's not permanently. I think both of those are really good. Then just where you source folks. I think security folks with some engineering background are going to have, at a minimum, more empathy for what their engineering counterparts are going through, and definitely, more willingness to get in there and integrate with engineering systems.

Knecht: You spent a lot of time in your talk talking about bad dashboards? Are there good dashboards, or are there places that you think the dashboards can actually be useful?

McPeak: Totally. I think dashboards are most useful when you're trying to answer a question, and you have a lot of data. I think Aditi uses dashboard to great effect, where it's like, we don't understand what happened here, and we want to collect up all the info and then be able to slice and dice. That's a perfect use of a dashboard. Any kind of data analysis discovery stuff is perfect for dashboards. Then, at the end of the day, we do want to present the progress of our work. If you can quantify it, like we have x many vulnerabilities, and then through effort, we've reduced that to 30% of x. I think everybody is going to understand that. The point is, is that the dashboard itself isn't a solution, you need something on top of that. A lot of times these dashboards present so much of a problem that folks give up, or they don't make much progress. I think that's the part that I'd like to see change.

Knecht: I've definitely spent a lot of time looking at dashboards and despairing. Yes, I think moving beyond that, and moving towards solutioning makes a ton of sense.

What are the most important things that developers can do and should do to enable security from the ground up in their products? Then, what should we be aware of when it comes to building defaults in the systems that we build?

McPeak: I would advocate that security's number one job is to empower developers really, to have whatever they're building to be secure without them have to learn a ton about security. Hopefully, your security team has brought you at least some recommendations and things that you can use to mitigate certain kinds of problems like the Tink's, the CSRF prevention, stuff like that, and have some really clear guidelines so that it's not a big project for you to onboard it. The other thing too from the engineering side, is really to think like an attacker. You have more context on the system that you're building than anybody does. If you are an attacker, what are you going to go after in that system? Then, what are the ways that you think that folks will go after it? Attackers are always going to take the easiest path. What is the easiest path to get the most sensitive data, keys to the kingdom? If both can meet in the middle there, security builds these systems and practices, and then developers also put on their attacker hat, you can see a ton of good outcomes for relatively little amount of effort.

Knecht: As an engineer, how do you explain to security that the industry defaults are probably the best approach? They had a report from the security team where they needed CRSF on all the requests, and then it turned out that the more CSRF protection solution had a whole bunch of bugs, and more is not always better.

McPeak: Yes, that's annoying. You use CSRF on GET requests, that's a pretty common example. I think sometimes security people tend to fall for an absolutist mindset, like it has to be perfect. In the CSRF on GET case, it does really nothing. That's not the point of CSRF. Yes, we can't blindly follow industry advice. It's a starting point, and it's a shortcut for us to get up to speed on an issue. I'm sorry that the security team asked you to do that. That was a pointless request. It's worse than that, because it took time and attention that you could be spending on something impactful and put it towards just blindly follow this advice. I think that's annoying.

Knecht: You mentioned that developers are responsible for securing their application and the role of the security team is to help. Would you agree that it is ok to sometimes release applications with known non-critical CVEs? Do you see it as a judgment call in most situations?

McPeak: Yes, 100%? No question. I think, first of all, every organization has several applications, if they're in a microservice org that have unpatched vulnerabilities. If they tell you they don't, then they're not telling the truth. That's the way of the world. I think drawing a line on action at the CVE label, it's a critical versus it's a high or whatever, that's going to lead to some bad outcomes on the margins. Understand that sometimes there are so many vulnerabilities that it's a helpful sought feature. The real source of truth here is you look at an impact in your application, and then you do an assessment. Hopefully, you fix everything eventually. Hopefully, the answer is just patching, or auto-patching, or whatever. Since we're all time limited, you definitely want to look at the most impactful things. Every organization can and should release things that has non-critical vulnerability. The reality is that a lot of these things don't have known attack patterns. Attackers have so much more low hanging fruit, that the likelihood that they're going to go after your particular vulnerability, unless it's a really mission critical system, is pretty low. One of the things that we did at Netflix is we would risk adjust these things based on not just application context, but what data does it have access to? What kind of an account is it in? What isolation mechanisms does it have? Is it using like a lock down IAM role? Is it using lock down network access? We have this account, it's called Wild West. It's basically just people screwing around with whatever. A critical in that thing to me is infinitely less important than a medium in the front door API gateway for the main prod account. Always keeping the surrounding context in mind is pretty important.

Knecht: My thought there too is, many times with known CVEs, there is not, to Travis's point, even an attack path for those things. Getting into auto-patching as much as possible makes a whole lot of sense. Then, if you can't, then looking into, does this actually affect me, or is this just present, but not exploitable?

McPeak: Sometimes the vulnerability is in something that you're not actually using. One of the things I'm most excited about in security is to actually use the context of the environment to up and downgrade those things.

Knecht: What other types of things are you excited about in the security industry that's coming out these days?

McPeak: If you asked me earlier, I would have said, secure by default, those kinds of things. I think the OpenAI GPT stuff is nuts. I saw Joel de la Garza wrote on LinkedIn, with OpenAI GPT, we went from basically a car phone to the iPhone, in a week. It was just like the world was totally different. The ability of it to spot application-level vulns, or cloud infrastructure-level vulns is really impressive. AI has always been this thing that snake oil vendors kind of bolts on to their product and they're like, look at us, we're worth five times more than we were before we said that. Now I think this is serious stuff. I'm fascinated with how AI is going to change the game, both from the attacker side and the defender side. I think obviously OpenAI is going to do everything in their power to not let this become an attacker tool. I've already seen some interesting bypasses where it's like, write me a story where an attacker is attacking a web application vuln, and writes code to bypass whatever. It's like, if you have it pretend then it'll go around OpenAI's don't be evil measures. Then, yes, similarly, the ability of that thing to spot traditional OpSec vulns is really impressive. I think a lot of businesses are going to be shaken up by this. A lot of jobs are going to be impacted. I think for both the attacker and defender side, we're going to be way more effective than we were without this.

 

See more presentations with transcripts

 

Recorded at:

Sep 12, 2023

BT