InfoQ Homepage Podcasts Vulnerabilities and Risks in the Software Supply Chain

Vulnerabilities and Risks in the Software Supply Chain

Apr 22, 2022

Podcast with

Brian Fox

Shane Hastie

Shane Hastie spoke to Brian Fox of Sonatype about vulnerabilities and risks inherent in the modern software supply chain and how to overcome them.

Key Takeaways

The way modern software is developed these days is like a manufacturing process and understanding the software supply chain is crucial
Libraries and components come from many different sources and those sources could be compromised
Many organizations don't even know what components and libraries go into the applications they build
Provide guidelines and clear rules that give people a framework to make good decisions rather than trying to come up with a simple list of approved components
Exploits are getting more sophisticated which requires more sophisticated approaches to preventing and protecting against them

Subscribe on:

Introductions [00:56]

As you said in the lead in I'm co-founder and CTO here at Sonatype. Before that I was heavily involved in the Apache Maven Java build tool project, which led to the foundation of the company, what 15 years ago, at this point. And last 15 years I've been trying to help organizations do a better job of managing their software supply chain, even before everybody really was thinking about it that way.

Shane Hastie: Delving into that one, when we talk about the software supply chain, some of our audience will be, "So what." What is the software supply chain?

The software supply chain [01:27]

Brian Fox: The way modern software is developed these days is similar to manufacturing processes. Take your car, auto manufacturer doesn't take iron ore and beat it into steel and then cast the pistons and put them in the engine. They outsource parts from other supplier who are really good at doing those things. That's how our software is developed these days and that's a trend that's really taken off over say the last 20 years ago. When I started my career that wasn't so much the case. We wrote a lot of the code from scratch, but with the rise of open-source and with the rise of tools like Maven NPM Package Manager and similar ones for other ecosystems, it's made it much easier to consume already compiled binary components so that today's development task is largely assembling components, wiring them together, adding your business logic and shipping it.

Just like manufacturing before, we as an industry have gained significant productivity increases because everybody doesn't have to figure out every problem. You can stand on the shoulders of giants. So that's how modern software is developed, but just like in the physical world that creates a supply chain and now you're using components that have come from elsewhere. There are sometimes issues with that, but that's what we're talking about when we talk about a software supply chain.

Shane Hastie: So let's delve into that components from elsewhere. Yeah, I just download a library from GitHub or wherever and it's working fine, incorporate it. What could go wrong?

Vulnerabilities in the supply chain [03:00]

Brian Fox: Everything. Back in December, I think the world was awoken again with the Log4j event and for those that don't know, Log4j is a project from Apache. It's a very popular logging framework. Logging sounds super simple on its face. How hard is it to write out a message to either a screen or a file, but what if you need to rotate the log files? What if you need to time stamp them? What if you want to turn levels on and off? You need to deal with multiple threads logging at the same time. It turns out it's a boring, but actually pretty complicated problem. Log4js been around for a very long time. It's probably the most popular. In fact, not probably, it is the most popular Java logging framework in terms of popularity of all Java components, from the Maven central repository, which Sonatype runs, which contains pretty much all the world's open-source Java. Apache Log4j is in the top 0.3 percentile of popularity. It's as popular as it gets.

So what you end up having is you have this consolidation of usage, of risk in single places. It's what engineers might call a common mode failure. If everybody's using the same part, there's a defect in that part, that's a challenge. It's also an opportunity from an attacker's perspective and so when you're using components from other people, the problem largely is that so many organizations just don't even know what parts their developers have put into the applications. The issue is often not the case that the component was really faulty or that it was terrible. These cases do exist. The challenge is if you accept that all software is going to have bugs because all software is written by humans and humans make mistakes. The question then becomes, what do you do when that happens? It's no different than auto manufacturers.

If they don't know what airbags they shipped in their cars when a company like Takata has an issue with some of them, how do they know what to recall? That's the equivalent of what is really going on in the software world these days. We would not tolerate this from our physical goods or for our supply chain. Companies that had no idea what was going into the things that they sell to us would not be around very long, but because of the rapid rise of the component consumption of modern software, the leaders in organizations don't recognize it and the processes haven't kept up with that and that leads to this inability for organizations to rapidly respond when, not if, but when there is a vulnerability or a bug in a component because they have no idea that they're affected.

Shane Hastie: So that's a fundamental shift in the way that we think about these components, both at the individual developer level, but also at the bigger organization. The development shop is no longer the black box. There's actually a big risk factor associated there. How do we make that change?

The lack of visibility to make good choices [05:48]

Brian Fox: I think the challenge has been largely that organizations haven't provided the right visibility to developers to make the right choices. I end up in a lot of conversations with organizations that imply that developers don't care, that they're just grabbing whatever willy-nilly and that's not accurate. Most developers, like artists they want to see their thing and they're proud of it. They want to be proud of the thing that they're producing. Nobody's intentionally choosing to build crappy software but if you're an organization and the only thing that you're pushing your developers to do is ship software faster you get what you measure. That's a well known thing and if you're not able to measure the nonfunctional things, like what is the open-source license? Is this license going to get us sued for copyright violations?

Does this component have known vulnerabilities? Does the project have a history of a lot of known vulnerabilities, in terms of the quality of the project, the hygiene, the reliability of it, the architecture, the popularity, how many other people in the world agree that this is a good component. When you don't have that visibility and now you put your developers trying to quickly figure out what to do it's akin to, they use what they know, they use what their friends know. They might go on a forum and ask a question. You leave a lot to happenstance. The better approach is to actually provide that visibility via tooling to the developers so that they can understand what's going on and also importantly, to be able to encode the organization's opinion on these things. In some cases, depending on let's say if you're distributing software, if the software you're writing is being shipped to a customer that would tend to trigger copyleft types of clauses.

Examples of issues that can occur without visibility [07:29]

Brian Fox: So you maybe can't use GPL components in those because it would say, if you're shipping this you have to release your source. It's a viral type of thing that tends to happen, but if you're writing it for a service, you're not actually shipping anything. Those copyright protections don't get triggered and so you probably can use GPL. Depending on the context of the application, what components might be allowed or not allowed are different. You need to be able to code those things into a system so that now when your developers are there trying to make decisions, they can do so eyes wide open, not only with the attributes of the component in mind, but how it fits into what your organization has said is important. Because absent that, what I see a lot of times is just simply ignored, but you might find one developer on one project who in the past, worked at a company that was super strict on licensing.

Brian Fox: So they've become like the licensing expert and pay a lot of attention to that. But then you've got another application team where maybe there's a person who likes security and pays a lot of attention to that. Again, you leave a lot to happenstance and so if you want them focusing on the things that matter for your business, you need to provide that visibility. You need to provide that prioritization and then you need to measure it. Everybody needs to be on the same page. In some ways it's obvious and yet so many organizations just simply don't do it. I think largely because the leaders aren't really aware of this componentization problem. They're just worried about how quickly can I ship my software.

Shane Hastie: One of the push backs I've heard is that you mentioned it, some programmers treat themselves as artists. They want that creative. They want that choice and now we are narrowing those choices for them.

Finding the right balance between choice and constraint [09:03]

Brian Fox: It goes both ways. I've seen organizations that, let's call it a naive attempt to solve this problem have said, well, we'll fix this, we'll remove all choice from the developers and we will give them a list, an approved list, a denial list of components. Well, one they're kidding themselves because most of the time these same organizations don't actually have a way to accurately assess the bill of materials in the applications. I see so many organizations that'll say, oh, this problem should be easy. We have 800 things on our allowed list. That's the only open-source that we're allowed to use and then we go in and we actually use our tooling to tell them what's actually in there and it's 10 times, a hundred times that because how do you know if they're following the rules if you don't even have a way of checking it. You're not measuring it.

Again, you get what you measure. So if you're not measuring the consistency against that allow list, you don't actually know, but worse, if you actually can lock it down, now you're arbitrarily restricting the developer's ability to keep up with industry trends and what's even worse again, is that oftentimes they get locked down to specific versions. When I'm warming up for conference talks and waiting for the audience to come in I always ask the question, who in here is using Struts 1. Struts 1 is a open-source framework for developing Java applications. It's had level 10 vulnerabilities that have been published and fixed 20 years at this point. Seriously, it's a very long time. Nobody should be using that really anymore. There's always somebody in the audience who raised their hand and says, we are and I ask them why?

They say, because it's on the approved list. The approved list was checked and validated two decades ago, the thing was on it and everybody assumes it's still okay. I like to say that components age like milk, not like wine. If you're not keeping up with this stuff, things are going to go bad. The naive attempt to create an approved list is a, either you're kidding yourself and they're using whatever they want and just not telling you, because you can't validate it or b, you've locked them down into such a way that you've broken their back. They've given up and now they're using versions because they have. Those are not the developers you want to keep around because the good developers will not stick around in an environment like that where they're asked to fight blindfolded with both hands tied behind their back using aged old technology.

Nobody wants that and again, if you can encapsulate the policy in tooling, then you can provide the guardrails and you can provide a degree of freedom for developers to innovate and use innovative components, modern components, subject to some sane policies. Those policies can include, you can't just go randomly grab new technologies. It doesn't have to be white or black. It could be red, green, yellow. Red being this is inconsistent and don't even ask the answer is no for whatever reason. Green might be, yeah, we're already using this and yellow could be, well, we're not using it yet, but it is otherwise consistent with our policy so maybe let's have a conversation. That provides that degree of freedom for somebody to look at this and go, okay, maybe I'll put the energy into seeing if I can bring the organization forward using some new technology, but I'm also doing it eyes wide open understanding that it is not yet blessed, but it is likely it should be approved because it's not banned.

There's a lot of degrees of freedom that can be provided when you actually encode the rules and then provide that down to the developers. Because the other challenge that I see, especially within security teams a lot of times is they want to just provide a list and say, I just want to send a list to everybody in the organization saying if you're using this component, stop using it. If you're using this component, move to this version. It sounds good but the challenge is that within each application, the actual usage of that component could be very different as well as the test coverage.

So in other words, moving a particular component might be easy for one application, but highly risky for another and if that other application maybe isn't actually exploitable to the problem that you're worried about, why would you take that risk especially if you're about to ship an important release, it might make more sense to wait and revisit that as tech that later, right? The people who are best able to make those decisions are the ones actually working on the applications, the people who are actually in the field, not the people who are just looking at a map saying, it seems like a good idea to attack over here. The only way you can achieve that at scale is through that automation and the tooling and federating that information down to the people best able to act upon it.

Shane Hastie: Shift left, how do we move that security thinking earlier in that food chain?

Shift left - moving security thinking earlier in the development process [13:34]

Brian Fox: Yeah. That's effectively what I've just described is kind of that motion, isn't it? It's capturing the information from the legal team, from the architecture team, from the security team and getting it to the developers earlier, the left being earlier in the software life cycle and allowing them to make those better decisions up front so they know something will be inconsistent with policy. They're probably not going to choose it because again, they're trying to get stuff done. Why are they going to fight that fight if there's a better choice in front of them? They'll make a better decision just with the visibility. Sometimes the carrot approach doesn't always work unless there's a stick somewhere. We've designed our tooling to allow it to be more flexible in early development, but then if those warnings and those policies are ignored, companies can configure it so that you could block a release for example.

So having that flexibility to provide them that early indication that like, hey warning, the drawbridge is up ahead, maybe you need to slow down and deal with this as opposed to, we're just going to break your builds and stop all progress because we don't like what you're doing. That is not development friendly and causes more problems than it solves. So the shifting left mentality here is in fact exactly what I'm describing, taking all that information, bringing it to the developers and allowing them to make the right decisions with a little bit of guidance, with a little bit of backup and enforcement and visibility that's required because let's face it, sometimes people still ignore good advice.

So having that backstop usually will allow the leaders who are responsible for that risk mitigation to feel like, okay, you've done the thing that I'm more worried about. You've prevented us from shipping more software that's going to cause us to get into worse harm and we've provided the early warnings to the development so as long as they do what they're supposed to up front, there's no mashing of the gears. There's no culture war going on here. That tends to be how we see companies be really successful at this.

Shane Hastie: And for some companies, this is a significant shift. How do you nudge people in that direction?

This is culture shift, not practices and policies [15:35]

Brian Fox: It depends on where they're coming from, honestly and it depends on the culture. Like we say, it's rarely the tech, it's the humans behind the tech. So, assessing an organization's maturity and inclination to move in that direction is the challenge and I tell our teams all the time, if I could put a button to solve that problem for them I would do it. The problem is not the software. The problem is the culture and what they're trying to do and if they're trying to do the wrong thing, we need to educate them on why that is important. I think the Log4j incident is the perfect incident to really highlight the disparity between those who are prepared and those who are not. Log4j again, this vulnerability was fairly easy to pull off.

Exploring the Log4j incident [16:15]

Brian Fox: The exploit was easy to pull off and the scope of the impact was so broad and the third point being there's so much public attention on it, that even if you weren't exploitable or vulnerable, you were probably paying attention just for the sake of stopping your customers from asking you. So it was high visibility widely deployed and easy to pull off. I can't think of another example as perfect as that. We saw some of our customers who were using the tooling and had it rolled out, they were able to remediate 80 to 90% of portfolio of tens of thousands of applications within two days of the first disclosure coming out. And then every time there was a subsequent fix, because a Log4j incident had multiple phases to it and there were four or five releases over the course of a week or so, those companies were able to stay at 80 to 90% of their portfolio tracking with those versions every time they came out within two days.

If we go and look at the statistics worldwide for the downloads of the Log4j versions from our repository, from Maven Central, 36% of them are still today. We're two months past that, more like 10 weeks at this point actually, 36% of them are still downloading the known vulnerable versions. We see organizations who as we sit here right now are still trying to access in their portfolio where do we actually have Log4j. They don't have any idea where they're looking for, where they're supposed to look and so if you're in that phase, how do you move on to the remediation? And you contrast that with huge organizations who fixed it in two days, it was almost a non-event. It was important because everybody was asking about it, but they immediately knew the engineering teams got the alerts at the same time the security teams did, they all knew they had to fix it, because it was a level 10.

They knew they couldn't cut another release most likely unless they fixed it. Everybody did their job and it was a non-event versus these other ones who are sending out emails, asking application owners if they're affected. That's the equivalent of your auto manufacturer emailing you and asking you to go check your brake rotors to see what manufacturer is stamped on it because you might have a problem. That's insane when you think about it but I still see organizations dealing with that right now for the Log4j problem and so that's where the discrepancy comes from the prepared versus the not prepared. It is huge and huge is a massive understatement.

Shane Hastie: That's one example that really highlights things. Tell us some other stories. What are some of the things that you've seen good and bad?

The exploits are getting more sophisticated [18:41]

Brian Fox: In some ways I've referred to the Log4j incident as a little bit boring. Feel like, what do you mean, that's boring? That was huge. It's like, yeah, but what happened here was two pieces of functionality. It's a stretch to call them bugs. One piece in the Java run time, one piece in Log4j, when combined together bad things happen. It's almost like two prescription medicines are fine, but you take them at the same time and you could have interactions. That's what this vulnerability was and it existed in the code since 2013. This was somebody found a new problem. It was disclosed and fixed within days of it being known, but then all the attacks happened after because the world didn't update fast enough. This is the problem we've been dealing with since basically forever and the window of time from the bad guys exploiting these has gotten shorter from weeks down to days.

However, the last five years I've really been talking a lot about the new attacks which are intentionally malicious components that are being slipped upstream into repositories with the express intent of causing harm. The difference there being is you might have had something in your software, nobody knew it was bad until the bad guy guys figured out how to make it do something and now you're racing to remediate it versus something who the minute you touch it you're exploited. So that for me is the new thing and that's where I've been thinking and trying to educate people for the last five years because when you understand that dynamic, you can't think about doing application security the same way you have been, which is many times legacy programs are trying to make sure everything is scanned before it's shipped or put into production.

That's great and if we take the auto manufacturer analogy, the Deming principles from Toyota and the supply chains, use fewer suppliers, use better parts from those suppliers and track where all those parts go. That's the basics. Everybody should do that. We expect that happening from our car manufacturers, but those practices are about building better, more efficient, safer and cheaper cars. They are not about protecting the factory itself from an intentional bombing. So if you do a great job of building the car that doesn't help you make the factory safer and so when you're talking about malicious components, that's really the analogy here.

If you're focused on scanning your software before you ship it, you've missed what might have happened upstream in the development environment and in a modern world with continuous integration, continuous deployment that development infrastructure probably has the keys to be able to touch the production systems. So if an attacker can get their way into that infrastructure, they can often find their way around the rest of the system. It's not just infecting developer machines. That's where the war is right now. So that's why I say Log4j while broadly impacting and is woken a lot of people up, it's actually last decade's war, not what's happening every single day right now.

Shane Hastie: So how do we fight that war?

Borrowing ideas from credit-card fraud prevention [21:34]

Brian Fox: That's a great question. The way we've taken to solving it, we took a page from credit card fraud protection and if you think about the early days of credit card fraud, they used to ship out booklets with card numbers in it that were blacklisted. After they were known to be stolen they were put on this list and you're supposed to check the list. Then they got a little bit more agile and made systems that could dial in and check the number in real time but those were very much reactive. It was only after it was reported the card was stolen that you appear on this effective blacklist. That's the old, boring vulnerability thing.

After we know this thing is exploitable then it gets on the list and everybody's supposed to stop using it. That's the Log4j problem I was describing. When we've gotten to the modern world, what the credit card companies have had to do is figure out how to stereotype each of us as consumers and understand what's normal for us and what's abnormal. So before the pandemic, I traveled a lot. My credit cards were used all over the world, but what I didn't do is I didn't go buy TVs in Kansas because I don't live there. Hotels, restaurants, rental cars, yeah, all the time, but at a store, no. And so that would be abnormal.

Then they would block that and text me a thing saying, was this you? The reason they were doing that is because it was abnormal for me. So what we've done is we've basically taken that as inspiration and now we look at all the new releases that get pushed into these repositories. We look at, who's been contributing to them? What are they doing? What are the dependencies? Has this person supposedly just cut a release from Asia when all of their releases come from California? That might be an indication that it's not who you think it is that's actually doing this work. So we've done on that. We've built models and then the AI can kick those events out same way the credit card company does and says, wait a minute, this is fishy. The turnaround time from when the community would find a vulnerability to when you could get the message out, it was already too late, because some of these things might have put back doors on the developer machines so, even a day is too long.

That's how we've been countering it. So combined with some of the other techniques we have, when we put a suspiciousness score on a component they can configure the policy to say, we don't want our developers using something that's got this type of flag, that's suspicious. So you basically the equivalent of stopping the transaction at the register before they walk out the door with the TV and then you figure out it was fake. That's what we're doing and that's the only way that I can think of dealing with this because it is very much a real time threat, exactly like credit card fraud and that type of scenario.

Shane Hastie: You've spoken about today's vulnerabilities, what's in the crystal ball? What's next year's or next month's vulnerabilities? What should we be thinking about?

Looking forward – types of attacks to be aware of in the future [24:09]

Brian Fox: I think we're still in the early days of these supply chain attacks. In 2017 there were a handful of them that had me looking at it, going guys, wake up, this is the new thing, this is going to be a thing. In 2018 there was an explosion of about 400% in those. In 2019 it exploded another 600% on top of that. In 2020, 2021 I don't even think we can count it anymore. There are so many, but there was just for a magnitude there's one type of attack that a researcher disclosed back in about February, March last year of 21. Since that time Sonatype alone has reported 63,000 instances of that to NPM and Python and other repositories of people trying to exploit that. That's just one example that we've had in the last year and so just a handful of years ago I could tell a story of 20 events and walk through reach of them and why they were significant.

Cyber-crime is a thousand times larger (in $) than the drugs trade [25:07]

Brian Fox: But now we're talking hundreds of thousands of attacks across the spectrum and we're just at the beginning of it. For context, I usually have slides that show this, but in 2016, I believe it was, the worldwide global drug trade as an industry, if you were thinking about it from VC terms, how big is this industry? What's the total addressable market? With something on the order of $550 billion worldwide. That same year cyber crime was about a $600 billion industry. So five, six years ago, cyber crime was already a bigger industry than all of the drugs in all of the world combined. That's pretty shocking when you step back and think about how much time and attention we as a society are dealing with drugs. And now I don't think we've seen the actual numbers, but at the time they were projecting that by the end of 2021, the cyber crime industry was going to be a $6 trillion industry.

Meanwhile, drug trade really hasn't grown much. It's basically a flat thing bubbling around 500 to 600 billion. So we're a thousand times bigger industry and if you want to think about it again in VC terms, that's the amount of money being invested against us, the industry trying to produce good software. They have $6 trillion at stake to steal from us. That's a lot of incentive for them to get really creative and be very persistent about this. So that's why I feel like we're, unfortunately, just at the beginning of this, it's only going to get worse and that's why organizations really need to be aware of this so they can really think about how they realize this in the modern world.

Shane Hastie: Some really interesting and somewhat disturbing thoughts in here. Brian, if people want to continue the conversation, where do they find you?

Brian Fox: You can find me on Twitter at Brian_Fox or Sonatype. I'm out there. You can find me, Brian at Sonatype.

Shane Hastie: Thanks very much for taking the time to talk to us today.

Brian Fox: Thanks for having me.

Mentioned:

Sonatype

Brian on Twitter

About the Author

Brian Fox

Show moreShow less

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.