InfoQ Homepage Presentations Securing the Development & Supply Chain of Open Source Software (OSS)

Securing the Development & Supply Chain of Open Source Software (OSS)

View Presentation

Speed:

39:15

Summary

David Wheeler discusses how OSS is developed & distributed as a supply chain (SC) model, how OSS developers can develop & distribute secure OSS today, and how potential users can select secure OSS.

Bio

David Wheeler is an expert on open source software (OSS) and on developing secure software. His works on OSS include "Publicly Releasing Open Source Software Developed for the U.S. Government", and "Open Source Software is Commercial". He also helped develop the U.S. Department of Defense (DoD) policy on OSS. He is the Director of Open Source Supply Chain Security at the Linux Foundation.

About the conference

InfoQ Live is a virtual event designed for you, the modern software practitioner. Take part in facilitated sessions with world-class practitioners. Hear from software leaders at our optional InfoQ Roundtables.

Transcript

Wheeler: My name is David A. Wheeler. I'm here to talk about securing the development and supply chain of open source software. I think it's important first to talk about what is open source software. Open source software is software licensed to users with the following freedoms: run them for any purpose, study, modify the program, and freely redistribute it, either the original or some modified version, without restrictions like royalty payments. If you want to know a full definition, a good place to go is the Open Source definition from the Open Source Initiative. There are actually a number of different licenses that are used to release open source software. They include the MIT, Apache-2.0, 3-Clause BSD, the LGPL, the GPL. Regardless, any of those licenses, if you release software under them, it is open source software. If you have software, but it's not open source software, common antonyms are closed source or proprietary software. It's important to understand that open source software is a commercial software. At least in the U.S., it's defined that way, by law, because as soon as software is licensed to the general public, it's commercial software. If you ever work with the U.S. government, for example, that's an important fact to know. An important thing to know about open source software, though, is that these licenses enable worldwide collaboration. Worldwide collaborative development of software, that's what these licenses enable. You can see that all around the world.

Open Source Software (OSS) Is Everywhere

Indeed, it's everywhere. These are just some quotes from some Synopsys studies, 98% of the code bases that they analyzed contained open source software. In fact, the majority of software code bases, even if the code base as a whole wasn't open source, on average, 70% of the code bases were in fact, open source software. The average number of open source libraries they found in the code bases that they analyzed was 528. I'm sure you know how averages work. 528 is not a max, that's just the average. Open source is everywhere. There's no slowdown in sight, the use of open source grows.

All Software under Attack via Vulnerabilities, and Supply Chain

That's all interesting. However, I also want to emphasize that software is under attack, via vulnerabilities in the software as it's deployed, and also in the supply chain. That chain from the developer's fingers and head, somewhere all the way through to where it's deployed. These are just some indications. Some are open source. Some are proprietary. Hopefully, it won't take much to convince you that software is under attack. Unfortunately, that includes open source software.

Is OSS or Proprietary Software Always More Secure?

I want to quickly right up front here make something very clear. I often get the question, is open source or proprietary software more secure? The correct answer is neither. The reality is that neither open source nor proprietary is always more secure. If you care about security, then you evaluate the software to decide if it meets your requirements. That said, there is something that gives open source a potential advantage. Saltzer and Schroeder back in 1970s defined security design principles. They're still just as valid today. One of those principles is called open design. "The protection mechanism must not depend on attacker ignorance." Open source better fulfills this principle, so it has a significant advantage. This should surprise no one. If the idea is collaborative development and collaborative review, we already have experience from academics, from science, from engineering, that having others review work is more likely to produce better results. It's widely perceived as a potential advantage, but potential advantage is not always achieved advantage. You still need to look at things. Indeed, even if it's an improvement, no software is perfect. It's very rare to find software that has absolutely nothing that can be improved. For example, vulnerabilities can be found in even well run projects. Continuous careful review is more likely to find vulnerabilities over time.

Common Problem: Known-Vulnerable Reused Software

Why is this important? One problem is that most projects, most organizations really don't have any idea what's running on their systems. They think they do. They think they have this amazing digital infrastructure. In fact, it depends on other components they may have no idea even exists, which leads us to a common problem. It's well known that known vulnerable reused software is a serious problem today. Many applications and systems fail to update all the components that are used within them. As a result, they end up with known vulnerabilities. They weren't known when they put them in, perhaps, but they've been discovered over time. Yet, those aren't getting fixed. This is not unique to open source. This is true for reused closed-source as well. However, typically, there's a lot more open source software in use, so failure to update open source software is especially bad, especially dangerous. Synopsys found that 84% of the code bases they looked at had at least one component with a known vulnerability. On average, there were 158 known vulnerabilities. The average vulnerability was 2.2 years old. People are not updating and that failure to update when vulnerabilities are discovered, can lead to a lot of problems.

Software Supply Chain Integrity Map

Let me switch gears a little bit, because I want to try to step back now that I've talked a little bit some points. It's a whole lot easier to understand, what are potential problems and how to fix them if we start trying to come up with a simple model? A way to think about things. Fundamentally, it's easy to think about how open source software is developed as a software supply chain integrity model. Going anywhere from the local environment developer, through a forge for that particular project, where the source code is shared out to build and verification improving. Getting released on some packaged repo distribution form, where it is eventually selected, either in larger systems or for deployment. Unsurprisingly, once you look at that simple model, you realize, attackers can attack or at least attempt to attack all of those places. Absolutely. The good news is, there are ways to counter or at least mitigate these kinds of attacks. You have to realize that there's something that needs to be mitigated. Let's talk about those solutions, because trying to do everything is fruitless. You need to focus on what's important first.

OSS Developer? Do this (now)

If you're an open source software developer, you need to do certain things now. If you're evaluating open source software, you should be looking for open source projects that do these things. If you're a developer, you should be doing things like learn how to develop and acquire secure software. Generally, it's not taught in schools. If you're an open source project, try to earn a CII Best Practices badge, https://bestpractices.coreinfrastructure.org/. It's a set of best practices for open source software, especially focused on security. Use many tools to find vulnerabilities via your build and verification environment. The tools will have false positives, it's true. The tools will fail to find certain things, those are called false negatives. That's also true. You still should be using tools, not as your sole way to solve all problems, but as part of your solution. You should monitor for known vulnerabilities, enable rapid updates when those occur. Evaluate before selecting software to be used. One of the big problems right now is typosquatting. Check that what you bring in is what you're expecting to use. Finally, continuously improve. I think these first six are the important places to start. Once you've got those under control, by all means, continuously improve. Attacks get better, so look for the attacks you're not covering. Improve that.

Course: Secure Software Development Fundamentals

I mentioned the Secure Software Development course. What's most important is that developers learn how to develop secure software. If you have no idea where to start here, free course. It'll take a day and a half of your time, and it will teach you a whole lot of stuff that you should have been taught in school but weren't. If you want to earn a certificate to prove that you learned the material, you can pay to earn a certificate. It's on edX. That's typically how a number of edX projects work. This is actually a project within the OpenSSF Best Practices working group.

CII Best Practices Badge

Another thing you can do if you're an open source project is try to earn a CII Best Practices badge. There's a little icon right up there. This identifies best practices for open source software, focusing on quality, security. If a project meets certain criteria, it gets a badge. There's actually three badge levels: passing, silver, and gold. Even getting the passing badge is a significant achievement. Lots of participation. I actually lead this particular project, we have over 3700 participating projects, and over 500 have achieved a passing or better badge. You can see more information there. If you're involved in an open source project, get a badge. You'll learn a lot, and then others can see that you're trying to do the right thing.

OSS User? Things to Consider for Evaluation (now)

What happens if you're a user of open source software? First of all, is there evidence that the developers are working to make it secure? Everything I just said, you might say, I'm not an open source developer, so it doesn't matter. Yes, it does, because you want to see whether or not the folks who are developing your software are working to produce secure results. The other things you can ask, is it easy to use securely? Is it maintained? Here are some example things. There are various indicators you can look for, like recent commits and releases and multiple developers, ideally, at least from multiple organizations. Does it have a governance model, maybe a governing board? Does it have significant use? You need to be careful here, it's really easy to be caught up in fads. It's a really big problem in the software world. "Google uses it. It must be perfect for me." "Facebook uses it. It must be perfect for me." No. They may have different problems than you do. Almost certainly they do. However, if there are no users, it's probably going to get no review. You don't want to just choose something because it's cool or fashionable. That's a terrible idea. It's common, but it's a terrible idea. No users is a problem. What's the license? Because if it's no license, for example, or some of these strange, not really open source licenses, then it's not open source, and you're not going to get some of the benefits for open source. If it's important, look at it yourself. Turns out that with open source software, you can take a quick glance at it. Did you acquire it securely?

OpenChain

There's something called OpenChain. It's an open standard. It's primarily focused on open source license compliance. As part of that, you have to figure out, what are the components you're ingesting? That's a really good place to start, because once you know what the components are that you're ingesting, you can start asking questions about it.

Own Software Evaluation

What's your own evaluation of that software? It's important to you. Not examining it is a risk. You may say, I'll just use this proprietary software. If you didn't examine it, that's a risk. Don't be afraid of looking at software, because even a brief review of code can give you some insight. Here's a list of some things you could look for. You can just look a little bit, and even a small looking at something you can often learn a lot. Indeed, you can pay other organizations to do it for a fee if it's important to you.

Can People Insert Malicious Code in Widely-Used Software?

One of the reasons that people look at code is to ask the question, can people insert malicious code into open source? Yes, they can. They can also insert into proprietary software. How could that happen? Wouldn't that be illegal? It's true, it wouldn't be legal and nobody cares. Attackers will do it, whether or not it's legal. Anyone can modify code and make it malicious. There's something called a hex editor. That's it. The trick is to get in the supply chain, which in the open source software world means you actually subvert or mislead the developers and have no one notice it later. There are mechanisms in at least the non-larger open source projects, which counter and greatly reduce the risks of this thing. Indeed, people have tried. The open source software folk's repositories have demonstrated resilience. The Linux kernel had an attack back in 2003. Somebody tried to insert malicious code, it was immediately detected and countered. There have been various repository attacks. More recently, University of Minnesota attempted to insert intentionally vulnerable code. All those were immediately rejected. The past experience with something called Borland's InterBase and Firebird suggests that in fact, these can be detected by open source software projects.

Things to Consider When Downloading and Operating Software

You also need to make sure you download it correctly, and the right thing. Make sure it's the right name. You'd be surprised how many people don't do that very well. Download in a trustworthy way using things like HTTPS. Of course, you need to operate it securely too. The usual standard, protect, detect, respond stuff absolutely applies to open source, just as with any other software. Continuous monitoring. If you find a vulnerability in the dependency, be prepared to update rapidly. You need to update software when there's a vulnerability faster than the attacker can exploit it. It's not that hard to prepare for it, using package managers, automated tests. The real problem is you have to prepare in advance.

What's Coming in the Future?

What's coming in the future? I think there's going to be more help in evaluating open source software. I'll mention the metrics.openssf.org. I think we're very much going to see a significant increase in the use of software bill of materials, SBOMs. People, basically, when most of your software is actually reused software, it doesn't make any sense to ignore whether or not those components have known vulnerabilities. That's the majority of your software nowadays for most projects. Ignoring that means you're ignoring most of your vulnerabilities. That's not ok. Your customers and their customers don't want to have to experience that. I think we're going to see a lot more SBOMs, more package managers or repositories implementing countermeasures. I think we're going to see more use of things like verified reproducible builds, and cryptographic signature verification. I think we're going to see a lot more about integrity attestation. We're going to see more use of memory-safe and safe languages. Probably some use of formal methods. I'm sure some people would like to see a lot more, but I think we'll see some more.

Information to Evaluate OSS

I mentioned metrics.openssf.org. The Open Source Security Foundation is developing, among other things, what they're calling a metrics dashboard, to try and make it easy to learn more about, "I was thinking about using this particular package, or I got a whole bunch of packages, can you help me figure out which ones are particularly risky?" At this point, the site does exist. It's in very early stages. Certainly, it's in development, and they would certainly love feedback and help.

Verified Reproducible Builds, and SPDX

I think we're going to see a lot more verified reproducible builds. For example Orion, from SolarWinds, was proprietary software, which had a significant subversion because its build system was subverted. Something called verified reproducible builds is specifically designed to counter that attack. The idea here is that if you can rebuild from the same source code, and the same tools, and so on, and produce exactly the same resulting executable package with multiple different independent efforts, it's much less likely that all those build processes were subverted. There's been a lot of progress towards making this a reality. It's been hard work by folks who've been working out for a number of years, but I think there's been a lot of success. I think we're going to see that increase in the next several years. I mentioned software bill of materials. There's a specification called SPDX, which is an ISO draft standard, about to be released in relatively near future. I think we're going to see a lot more in general about SBOMs.

Sigstore: Software Signing Service

Sigstore is a way to make software signing much easier to use. Technologically, the ability has existed for years, but it's hard to do in practice. The math works, but that's not enough. One project called sigstore is trying to make this whole software signing process and verification much easier. It's really the verification that's the biggest problem. Sigstore is using things such as transparency logs to help achieve that.

Increased Use of Memory-safe and Safe Languages

I think we're going to see a lot more use of memory-safe and safe languages. A vast number of vulnerabilities today are the same old vulnerabilities that people have been seeing for decades. Many of those are basically because languages like C and C++ don't protect you. I think we're going to see ongoing efforts to enable memory-safe languages, particularly with Rust. We can see projects like Curl, Mozilla Firefox, and the Linux kernel are either using or in the process of working to use Rust in particular. There are challenges, but I think they're quite overcomeable.

Open Source Security Foundation (OpenSSF)

The OpenSSF is a foundation within the Linux Foundation too, which is collaborating to secure the open source ecosystem. There's a link, https://openssf.org/. There's their current working groups. If you're interested in more generally open source and security, this is a very good organization to get involved in. Please do get involved. More broadly, if you're interested in improving open source software security, get involved. Of course, there are many other organizations who are working in various ways. If you want to make the world, the future a better place, the best way to achieve that is to get involved to help make it a better place.

Distribution of Known Vulnerabilities by Programming Languages

Do known vulnerabilities vary widely by programming languages?

There is no language that guarantees there are no vulnerabilities. There will always be a potential for vulnerabilities regardless of the programming language. There are certain vulnerabilities that are especially widely common in certain languages, probably the most obvious example are C and C++, which are not memory-safe. In most programming languages, for example, trying to access an array out of bounds will immediately be caught. Not so in C. That will turn into an instant potential for vulnerabilities. Even an attempt to read out of bounds of an array can become a vulnerability. C, C++ have a huge number of undefined behaviors. A lot of software developers who even write in those languages have no idea what they even are. If I add one to the maxint, that means it'll wrap around. No, it means a potential security vulnerability. Why? Because that's undefined behavior. There are other got you, but there are got you's in every programming language. It is fair, though, that a number of people want to move towards memory-safe languages, to at least eliminate those classes of vulnerabilities, even though that won't eliminate them all.

How to Treat Vulnerabilities in Third-party Dependencies

We check for vulnerabilities during the build and fail the build when vulnerabilities are found. What if the vulnerability resides in the third-party dependencies? Do we need to act? Is it ok to give a three-month period to repair the issue?

There's two sides to that coin. First of all, just because there's a dependency that's a vulnerability does not mean it's exploitable in your situation. Nokogiri includes a lot of support for XML. If there's no possibility to give it XML, there's no possibility. Most libraries have capabilities functions that might have a vulnerability if used, but if you don't use that part or that feature, or maybe you protect the data in some other ways. For example, if you presented data in this form, it might become a vulnerability but you do what's called input validation ahead of time, and therefore, that malicious data can never reach the component. There is always the possibility, and in fact, in many cases, a high likelihood depending on the situation, that some of the vulnerabilities in the components can't be exploited. In which case, absolutely, you can take your time.

However, it can be very difficult to determine for sure, whether or not something is a vulnerability. There was an academic paper long ago about a tool called RATS, where they found a vulnerability, at least they thought it was a vulnerability, but they couldn't figure out any way to exploit it. After a lot of analysis, they decided that that was a false positive. It really couldn't be exploited. Far later, they found out that, in fact, it was exploitable. Although you may think it's not exploitable, it actually is incredibly difficult to verify. Although if it's absolutely not exploitable, you don't need to hurry on it. It's very difficult to determine that.

As far as a three-month window? Three months, if it's actually exploitable, is pretty concerning. Your CIO does not decide how fast you need to repair it. Your process does not decide. The person who decides is the attacker. The attacker decides when you have to fix it. If the attacker is going to develop an exploit in the next few hours and release, you have a few hours. That's the time you have. A lot of folks really want to update once every few months or years. For most projects, those times are long gone. I run a project where we generally update to production in a day. We try to do it within an hour. Why? Because the attackers are moving, and you have to move faster than the attacker. How do you do that? If you're on a three-month cycle, you go, how could I do that? By preparing ahead of time. You should have everything, all your automation all in place. Use automated tools, package managers, to figure out what you have, to do a rapid update.

Would that update break anything? That's why you have an automated test suite. My automated test suite runs, but I'm not sure I can trust it. Then you have a bad test suite. You need to fix your test suite. Now is the time to fix your test suite. If your test suite passes, you should be ready to release to production. I realize that that's a hard bar for a lot of organizations. That's the goal that you want to achieve. Many folks, of course, are not going to quite get that goal in the near term, but that's what you should be aiming for. You've got to respond and deploy faster than the attacker can exploit. There's no point in fixing the barn after the horse has left. You got to fix the barn before the attacker steals the horse. You are on a clock, and the clock is not measured in months.

The Performance of Memory-safe Languages

Do memory-safe languages like Rust pay a penalty in performance for their safety?

That's one of the exciting things about Rust. The vast majority of programming languages that are memory-safe, have a pretty significant performance penalty. The list of languages that are memory-safe and don't have a performance penalty is pretty short. For languages that have more than a fairly large number of projects, you're really talking about Rust, Ada, Fortran, there may be one or two others, but there are relatively few languages that are both high performance and memory-safe. Rust is actually pretty impressive that way. A lot of projects are working on what's called Rustification. Switching over at least parts of their program over to Rust, because the performance penalty is relatively small. Indeed, it's more complicated to make the argument about penalty, but in fact, in general, there's either no or very little penalty. In some cases, the Rust programs run faster. Partially because not only is it memory-safe for like countering buffer overflows, but the Rust model makes it a lot easier to write parallel code without fear. A lot of C programs you'd be afraid to do a whole lot of things in parallel because, would my threads mash on each other? You can actually get a performance advantage from it.

Recommendations in Commercial Packages Using Components

What are the recommendations in commercial packages using components or packages?

Commercial means both open source and proprietary. I don't have a particular recommendation in terms of the scanning. Right now, it is way more important to use one than use zero. All the tools have pros and cons. All the tools have false positives, false negatives. It's frankly better in many cases to use multiple tools. Start with zero to one, then we can use more tools. It can be a good thing, particularly if you're using not so much the scanning for components, but looking for the source code. For example, there are various SCA tools. They use multiple data sources. One may report a vulnerability and the other one may not because they're using multiple data sources to identify vulnerable components. There are various source code scanners that look for vulnerabilities. They have different heuristics. Multiple tools is better. Start with going zero to one, then add and so on.

Tips on Keeping Dependencies Up to Date on Big Projects

Do you have any specific tips on how to keep dependencies up to date on big projects?

First of all, don't do it by hand. That's actually the number one thing. When you were only at one library, doing it by hand was no big deal. When your average is 528, that's ridiculous. You got to do this with automated tools. You want to use package managers, and you need to use at least one tool to scan and basically keep track and warn you. There are many systems and tools and services that can track and warn you about, there's a known vulnerability in this component.

Finally, automated testing. It's really hard to overemphasize that. If you're trying to get an update within an hour or a day, saying, after we update our software, we're going to go through this six-month test cycle. You don't have six months. The attacker is not going to wait. You've got to have a good automated test suite. High coverage is at least a good indicator, but it's more than that. For example, you need to make sure you have negative tests. This is a big failing of most of the TDD community. The TDD community talks about making tests. Make your tests and then write the code to fix it. That actually can be really helpful. However, you also need to test to make sure that things that shouldn't happen, don't happen. For example, if your system has a login system, and if you have user accounts, but users shouldn't be allowed to do certain operations, make sure you have tests that test that the users cannot succeed in doing those operations. Those are called negative tests. Tests to make sure that things that aren't supposed to happen, can't happen.

A failure to include negative testing is one of the big problems in a lot of test suites today. It's how the Apple goto fail, goto fail vulnerability happened, where they had a library that checked certificates, and they gave it a good certificate and said, it's good. They gave it a bad certificate and said, it's good. The whole point was to check certificates. If it always returned, it's fine. It's not useful. It's doing the wrong thing. In fact, it's a vulnerability. You need to make sure that things that shouldn't happen don't happen. A user without authorization tries to delete something in your system, make sure, in your tests, someone who tries to delete something they're not supposed to be able to, and the system rejects it.

Good Best Practices and Tools for Screening Libraries Used Within Project's Container Images

Are there good best practices and tools for screening libraries used within project's container images?

Container images are a challenge. First of all, for libraries in general, there's the CII Best Practices badge. I actually run that. It doesn't take all that much time to get a badge. Please go to the CII Best Practices badge site, bestpractices.coreinfrastructure.org. If you run or are involved in an open source project, then get a badge. Those are for specific libraries within the container images. When should the checks be performed? When the libraries are imported, and at the very least before project use. If you can do the checks on library at point of use, that can be helpful too. It depends on what checks you're talking about. If you're talking about checks against, say, is this known to be vulnerable? Usually, you do that routinely as part of your development processes and so on. As far as checking whether or not the library does what you're supposed to do, some checks right when they're being used, say on startup of the container are not a bad idea. The problem is, it's hard to be specific, because it very much depends on what checks you're talking about.

Safe and Fast Alternatives to C and C++

I think Rust is probably one of the most common one that's used today. Ada is an old language, but it's perfectly fine. A fast language. In particular, if you're looking for something that's stable, and has been around a long time and has an ISO standard, that's a decent alternative. Good old Fortran. Fortran is old. It's the first programming language that got any serious use. However, that's actually incredibly fast. Widely used within the machine learning community, because a whole lot of the neural net stuff are basically matrix manipulations, and they're all written in many cases in Fortran. Rust, Ada, Fortran are probably the shortlist. Obviously, you could switch over and use assembly, which is the fastest, but very few people are going to do that today.

Dealing with a Custom Operating System

You're dealing with something that has a custom OS. That's actually not necessarily a bad thing, because you can customize your OS to have all of the components that you care about. Code that's not in your container, or virtual machine, or your system, nobody can exploit the vulnerabilities in the code that's not there. The key thing there is to make everything automate. There should be a single command that updates the whole thing. A single command to deploy. You should automate, because if you do, then, I need to update. When do you need to update? I need to update that. Or, please run and figure out what I need to update.

Run all the tests. Do everything. Let's get this thing out. If you automate it, that can be pretty quick. If it's not automated, if you have to follow a long, complicated manual process, first of all, it's unlikely that you'll be able to do that correctly every time. Second of all, it would be too stinking slow. The pain of doing that means that people don't do it very often either. That's actually the unsung challenge. Once you automate the, "I need to do an update today." Ok, I'll do that right now. It takes me two commands. I'll just check and make sure everything worked out, or just let it determine if everything worked out and go. If you automate, the, "How do I do this?" Becomes, no big problem.

See more presentations with transcripts

Recorded at:

Feb 08, 2022

David Wheeler

InfoQ Software Architects' Newsletter