BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Securing Software from the Supply Side

Securing Software from the Supply Side

Bookmarks
30:51

Summary

Nickolas Means talks about the tools that GitHub provides for Open Source maintainers to improve the safety and security of the software supply chain at the source, as well as how to leverage work to make your own codebase more secure.

Bio

Nickolas Means works as a Senior Engineering Manager at GitHub. He is an experienced technology leader with over a decade of experience delivering mission critical web applications and building highly-engaged, effective distributed engineering teams. He loves finding awesome people and giving them the space they need to do amazing things together.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Means: I'm Nickolas Means. I'm a senior engineering manager at GitHub and I lead our work on security and compliance product features. In other words, my team works on stuff to help you write more secure and compliant code, not to keep GitHub more secure and compliant. That is a very difficult thing to explain sometimes.

Using Open Source

How we develop software has changed a lot over the past 25 years or so. One of the biggest changes of course is open source, what started as sort of a niche academic activity where developers at universities or large institutions would write some code and share with each other has grown to become the foundation of almost all modern software development. In fact, some estimates say as high as 99% of all software projects anywhere in the world in any industry currently use open source. It's really hard to find a project these days that doesn't use open source. What this means is that it's easier than ever to build software. Instead of having to implement your own low level data structures, or database adapters, or web servers, you can just import that stuff into your project from open source. You don't have to do it yourself. It's sort of remarkable if you stop and think about it because code reuse went from this pipe dream we all had 20 years ago to this thing that we do every day without really even thinking about it. It's just part of our everyday development practice now.

We get to reuse library and frameworks on a daily basis that save us a huge amount of time and we've also built amazing tooling to support this code reuse. Package manager specifically integrated into our development and build processes, and make the work of importing and using these frameworks and libraries almost negligible. As our tools for sharing and collaborating have improved, our dependency graphs have been growing.

Not to pick on one ecosystem in particular, but it's not all that uncommon to see a JavaScript project that will pull in more than a thousand dependencies directly and transitively through their dependency trees. It's now normal for any project, even something somebody does on the weekend as a hobby project to use a hundred open source dependencies. This is great, it means that we can focus on writing the code that makes our application unique. We get a bunch of baseline functionality and best practices baked in for free through the open source libraries and frameworks we use. We get to operate at a higher level, we're more efficient, and we can deliver value more quickly. There are challenges to this that we might not think about as often as we should.

It's great that we get to benefit from the work of thousands of others as we build software standing on the shoulders of giants, but it also means that thousands of others now have commit access to our production code. If you stop and think about it, most of the code you're putting into production is likely code your team didn't even write. If you're like most organizations, you probably haven't reviewed either who has the time. If we put our security hats on, this starts to get scary in a hurry because of one of your dependencies has a vulnerability, there's a pretty good chance that you have a vulnerability as well and this might not even be a dependency that you knew you had. It might be a dependency of your dependency's dependency. It might be really transit of all the way down at the bottom of the stack. An innocent mistake or worse, a malicious attack anywhere in your supply chain has the potential to affect you deeply. The world of software has changed and we're now playing catch up. We need new processes, new tools and new norms to secure software development.

At GitHub we recognize the trust that the community has placed in us and we know that we have a huge responsibility and opportunity to help. We're taking this very seriously, but the challenge is that it's not something that we can do alone. We can't solve this problem unilaterally. In fact, one of the things that makes this problem space so difficult to address is it's not something that any one group can fix. It takes our whole community. In order to make the software supply chain as a whole more secure, you have to coordinate the activities of at least four different distinct groups of people. Security researchers, open source maintainers, individual developers, and security teams all have to work together. Most attempts at addressing dependency security have focused on just one of these personas.

We decided to take a step back and think about this problem space holistically. What do security researchers need to do to find vulnerabilities and communicate their findings? What do maintainers need to do to keep their users safe? What do developers and for companies big enough to have one, the security teams that support them need to use open source safely and confidently? We took a look at the entire open source security workflow and identified the challenges.

Security Researchers, Maintainers, Developers

Let's talk about these group by group. The first group we'll talk about is security researchers. These are the white hat hackers, the people that are out there poking and prodding at open source libraries and looking for holes. There's a couple of challenges that they face. First, the discovery process is completely manual. There's no great tooling around it. It involves years of hunting instinct and learning how to find these holes and software packages. It's an elite activity that only a small number of people know how to do and that's to the detriment of our overall safety. Second, when researcher finds a vulnerability, what do they do with it? They need to report it to the maintainer, but the vast majority of open source maintainers don't provide a private mechanism for reporting vulnerabilities.

Speaking of maintainers, let's talk about them next. When a maintainer discovers or is notified about a new vulnerability in their code, they need to fix the problem and alert their users. What work they're going to fix is their first challenge. Unless they maintain a private fork of their code base, it's difficult to collaborate with anybody on a patch or to share the fix with the researcher that reported the vulnerability in order to validate that the fix actually fixes the problem. Second, once the maintainer has a fix, they need to get their users to update. That fix doesn't do anybody any good if it doesn't make it into code bases around the world. More than 90% of open source projects have never publicly announced a security vulnerability. This doesn't mean they're not there and it doesn't mean they're not getting fixed. It just means that when they do get fixed, they're fixed quietly, they're pushed out as the version update. The reason for this is that the whole process can be pretty intimidating and scary. The process of getting a CVE if you've never done it before, it's pretty involved and remember that most open source maintainers are volunteers. They're not people that are doing this professionally. They're doing it for free on their spare time.

Now for our last group, developers. That's probably most of us here in this room. The first challenge we face is actually getting those critical updates into our code bases. The stream of updates and alerts that flow in and try to get our attention every day can be overwhelming. Even if we do happen to catch on alert, we've got so much to do that often this is not the top of our list. This is not the thing that grabs our attention because there's future pressure to deal with, there's deadlines to deal with, there's conferences to deal with. We've all got more demands on our time than we can actually deal with.

That's what we've accumulated and say that around 70% of critical vulnerabilities remain unpatched 30 days after developers have been notified. That's a long time to leave a security all out in the open and vulnerabilities recur. Sometimes they happen because you're using a library that you imported in an unsafe way, sometimes they're happening because code that you wrote is unsafe, because not all vulnerabilities are resolvable by just updating a dependency. Similar vulnerabilities recur again and again around the world. At GitHub, we've been investing heavily in making this entire open source security workflow much easier for everyone involved. We want our supply chain tools to make working with open source securely just as easy as the pull request may contributing to open source in the first place.

Demo

Let's walk through a demo of some of the tools that we built in in the last six months or so to help maintainers and consumers of open source make this ecosystem that we all care about so much more secure. For our demo, I want to walk you through the life of a vulnerability. I want to point out upfront that some of this won't be directly relevant to you if you're not an open source maintainer or security researcher. That's fine, I want to walk you through it so that you have a good idea all of the work that has happened before you ever receive that vulnerability alert in your repository.

Let's start at the beginning by putting on our security researcher hats. Let's imagine we think we found a vulnerability in the Loofah library. Loofah is a Ruby library for sanitizing HTML and XML input. The first thing that we would likely do is this. We'd likely visit the Loofah repository and see if we can find any contact information for the maintainer. We definitely don't want to just open an issue and accidentally zero day all of the users of this library. As a researcher, I know that many repositories list their disclosure policy and security.md file at the root of the repository and sure enough Loofah has one. What if I didn't know that? What if I was a new security researcher or somebody that's trying to learn the craft? What if I went and just try to open an issue? Adding a security.md file to the root of your repository triggers another important feature.

If you look right over here in the sidebar there's a pop up that says, "It looks like it's the first time you're opening an issue. Are you reporting a security vulnerability? If so, you should check out this project security policy." If we click on it, it takes us straight to that security.md so that we can see how these maintainers would prefer that we report this vulnerability to them. You can also add the security.md file to a top level .github repository in your organization and this file will cascade down to every repository that you've got. You only have to set this once. If any of you are open source maintainers, I'd highly recommend taking the time to set this up because it will stop a lot of accidental public disclosure. Let's say that the Loofah team agree that I found a vulnerability. They need a place to collaborate on a fix in private, perhaps including me in the discussion to understand the vulnerability better and validate that the proposed fix addresses my concerns. Then once they fix it, they need a way to notify other users that they all need to update this dependency. The maintainer can do all of this with GitHub security advisories.

I'm not a maintainer of the Loofah repo so I can't demonstrate this feature here. I'm going to hop over to a demo repository that I am a maintainer of. You can tell that my super popular open source library is very complex. All the code that it's got is a README, but that's not important. What I want to show you is that every repository in GitHub has a security tab and if we click into the security tab, we're taking first of you that shows any open security alerts on the repository. We'll talk about those in a minute. What we actually want to talk about is security advisories. If I click over here, you can see that there's one draft security adviser that's already open, but not published.

Let's create a new one. Let's see what this process is like for an open source maintainer. First, we're taking to this form where we can fill in some details about the vulnerability that we've identified. We can put in our affected versions. In this case we'll say that it's anything below 1.01 and we'll put our patch version as 1.01. Package name and an ecosystem. I'm going to put demo in here so that I don't cause our curation team grief. Normally, you would put RubyGems, or NPM, or pip, or something like that in here. Obviously, it's just a README so we're going to leave it as low severity. If you already had a CVE, you could put your CVE in right here and then there's a body that you can fill out with details about impact, patches, workarounds, any references, or any more information you'd like to link. I'm just going to go ahead and create the security advisory here.

You can see that all the meta data that we've entered is here in the header including all the information that we've entered in the body, but crucially, there's this block right in the middle to collaborate on a patch in private. If the open source maintainer clicks this button to start a temporary private fork, we will create a private fork in the background. It's important to note that this is not like a regular GitHub fork. This fork is completely disconnected from the repository. It won't show up in your forks list, it won't show up in your repository network. It's completely isolated so that you can feel safe and secure working on a security patch here without accidentally disclosing anything. It even gives us the git commands that we need to clone and check out those repository and create a new branch. I'm going to copy that command. I'll hop over to my command line and pull it down, and open up the README, and we'll type in the fix really quick. We will close in right, we'll commit our change. I'm going to push it up to GitHub. There's a command over here we can copy that will make that easy for us as well.

If I go back over to the security advisor, I can refresh the page and you can see that the branch that we just pushed is here waiting for us to create a pull request. Go ahead and create compare and pull request here. This is just a standard pull request with one significant difference, you can't actually merge it here. You're prevented from merging it from the regular pull request, but you have to go back to the advisory for you to see it. This is to keep you from unintentionally disclosing this vulnerability because you don't want to merge this code until you're ready to publish the advisory. If we scroll down, we see that we've got the pull request here ready to merge, but below we've got this button to request the CVE.

This is brand new. As of couple months ago, GitHub became a CVE numbering authority. This means that we can issue CVEs for any open source library that's not covered by another CNA. If you do open source at Google, or Facebook, or somewhere like that that has a CNA in-house, you would still use your regular CNA. If you're an independent open source maintainer who's not covered by another CNA, we want to help you with that process. We want to make the CVE process easier and less intimidating for you. I'm going to skip clicking this because I don't want to send it to our curation team. You'll notice when I switch over to publish advisory, that it won't let me publish it and the reason for this is we've got to merge our code first. I'm going to emerge the pull request and this merges the code into our main repository. Then I can come down here and click publish. GitHub asks if we're sure we want to publish this, and it can only be undone by contacting support. More importantly, once you've published it, it's out in public and we all know that once something is public it can't actually be undone.

Now this is published. We can still request a CVE even after the process. If you forgot to do that or you decided you needed more details to do it, you can still do that. If I go back to the security tab and to the advisories, we can see that we've got one published advisory on this repository now. I can click in here and it will show us all the details. Any open source repository that uses these, you can go and click into their securities and into advisories, and you'll be able to see any security advisories that they've published. This is all open and out in public. Importantly, if this isn't one of our covered ecosystems, this will also publish an alert. This will go to our curation team. They'll evaluate it, make sure it's not missing any data, make sure it's got all the information it needs, and GitHub will publish an alert to any repository that's affected by it.

What does that look like? For that, let's hop over to a different repository. This is just a simple ruby app, it doesn't have much to it. I really only need the gemfile here. Let's hop into our gemfile. That loofah vulnerability we talked about earlier, let's added in here. Maybe this is not great practice, but maybe we copy the string to add loofah from another project. It happens to have an old version number embedded in it. Nobody's ever done that. Close it out, we'll bundle install. Now we got this update in our gemfile, in our gemfile.lock. Let's take a look at it. Let's commit it and we'll push it up to GitHub.

I'm going to go back over here at the repository and we'll refresh it to see our commit. You can see, as quick as we've gotten that code pushed up, GitHub has already found and alerted us to the security vulnerability in that version of the library. Let's click into it and see what it says. It turns out that the version of loofah we used, loofah 2.2 had a security vulnerability and we need to update to version 2.3.1 or later to fix that security vulnerability. We can also see right here that GitHub is already working on generating an automated security fix for us.

While waiting on the PR to come through, let's talk a little bit more about how GitHub does this work for us. Our curation team, the same team that evaluates GitHub's security advisories, also curates new vulnerability data that comes in from the national vulnerability database, as well as selected community vulnerability sources. If you've looked at CVEs before, you know that the data is not very structured. It's inconsistent, so we take that data, and we pull metadata out of it so that we can automatically act on it. I want to call it that that data is actually publicly available to everybody free to use via GraphQL API. The other part of this is when you push a dependency manifest up to GitHub, we parse it and we store it in a graph database containing all of the dependencies for all of the public and opted in private repositories across GitHub. That means that when we publish a new security vulnerability alert, we can go look for all of the repositories across all of GitHub that use that library and alert them. We're doing our best to keep the community safe by letting people know when they have a vulnerable dependency in the repository.

The interesting thing is, by the time you see a security alert for your repository, you'll probably already have an automated security fix as well. Let's refresh the page and see what that looks like. Sure enough, there's our pull request. We'll click in. You can see that the automated pull request process has taken us up to the minimum version required to fix the vulnerability and no further. We don't want to update further than we have to. That should be your choice, but we do want to update you far enough to get you safe. Embedded in the PR, we've got release notes. You can see what's changed since the last version you committed and what this version is. We've also got the change log and a list of commits, and you can click here on the compare view and it will show you everything that's changed about this library since the last version you depended on.

We've also got a compatibility score here which is really interesting. This is based on CI runs for public projects, so projects that we've generated this pull requests for, how often is that build passing. Seventy percent of the time updating this dependency is causing a problem. If this is me, I'm probably going to take a close look at this and make sure I'm ok. I can do that by scrolling down. If I had testing set up on this repository, my CI results would be right here in the pull requests just like they always are. I would be able to click merge pull request confidently and know that this wasn't going to break my code base.

If you got a private repository set up that you like to enable these features for, it's super easy to do that. If you're public, it's already opted in. From private repositories, just go into your settings tab and it's actually on the very first page of settings. You just scroll down to data services and you have to allow GitHub to perform read only analysis of your repository since it's private. If you enabled dependency graph and security alerts, you'll start getting security alerts and automated security updates for your repository. It's just that simple.

Finally, I'd like to show you a feature that we built for security teams and others in large companies that are responsible for monitoring how an organization utilizes open source. It's called dependency insights and it's available on enterprise cloud plans. If you go to your organization, if you go and click on the insights tab and then click on dependencies, you can get a holistic view of all of the software that you're dependent upon across all of your repositories in your organization. You can filter this by license type. Say, we want to know everything we've got that is an Apache 2.0 license, we can click right here and the list will filter down to just dependencies with Apache 2 licenses. We can clear that out and then we can look at critical vulnerabilities of the license in there.

We can see everything that we depend on in our organization that has a critical vulnerability. Interestingly, we've still got a dependency on events stream, the library that was subject to a famous dependency hijack attack late last year. If I see this and I want to know where I need to go to resolve this, I can click on the dependent right here and it'll show me which repository in my organization is still using this library. I can go bug whoever maintains that repository to fix it up.

From the demo, you saw that we've addressed several of the problems we identified in the open source security workflow. We've addressed coordinated disclosure with GitHub security policies, we've addressed private collaboration and user alerting with GitHub security advisories, and we've addressed fixing your vulnerable dependencies with GitHub security alerts an automated security updates. We saw a manual discovery and recurring vulnerabilities to address, but we recently announced the acquisition of Semmle who have built an amazing set of semantic analysis and code scanning technologies to address these areas. We'll have more announcements in the months to come about how we're using Semmle technology to make vulnerability discovery and monitoring for recurring vulnerabilities even easier.

At GitHub, we've made a huge commitment to securing the open source supply chain giving researchers the tools to find vulnerabilities, maintainers the tools to fix and announce them, and developers the tools to easily take advantage of those fixes. This is a journey we're all going on together and we're excited to do our part to secure software development. If you want to learn more, please visit our security page at github.com/features/security. It will walk you through all the things that I've just shown you. We're continuing to make big investments in the space as well and we'll have more to announce in a couple of days at GitHub Universe and in the months to come.

Questions and Answers

Participant 1: The is a very meta question. I've had this conversation when I first heard about this. If I want a public repository, can I opt out of this notification because I'm concerned about if there's a liability? If I sit on it for some period of time and it caused some downstream problem, could I be liable for the breach? I said it's a very meta question, but it does concern me about the size of the open source project and the impact of the dependency, and where it shows up. For example, if a bank gets breached, an agency, anything, or nobody could talk about an opt out.

Means: You're talking specifically about downstream repositories.

Participant 1: I'm a maintainer and if I get automatically notified and I'm in the hospital for a month or so, and I just don't have time to deal with it, but it has upstream implementation, or somebody is using it.

Means: From a maintainer perspective, we always want to work with the maintainer. We don't ever want to zero day them. We don't actually do responsible disclosure directly on platform yet. We can set up a security policy and direct you to the right place that a maintainer wants you take that information, but we're not going to publish it on behalf of the maintainer unless somebody is open to public issue.

Participant 2: Actually, I want to pick back on this issue. Some projects get neglected and they get this community that forks and maintains, and I've actually been on both sides of it. I have some neglected projects and actually, there is a bunch of ones that I’ve neglected. When a vulnerability is found, there is no use to actually contact the person who is at the top of the tree because he doesn't have time to deal with it.

Means: We don't, yes. One of the things we encourage in that scenario is if you're not actively maintaining repository, archive it, so that it's obvious to the community that it's not being actively maintained.

Participant 2: Yes. Somebody has to get the time to archive. Then there is a whole community that fork it, and it’s just going on. It's sometimes hard to see who is the main guy.

Means: In that case, if it's a fork repository, we would rely on the security.md file on the fork repository to direct users to the right place.

Participant 2: Whoever is at that fork, they have to update it?

Means: Yes. They know that it's security.md file. Yes, exactly. Participant 2: Another question - you have these things with them which are called stability or compatibility metrics. Is there a way to look at the failing? Do guys disclose it? See whose runs fail because maybe they ran some weird version of Node or they messed up data.

Means: We don't have that right now, but that's a great idea.

Participant 3: Is your security feature only available for contributors or for the consumers also?

Means: What do you mean? Which features?

Participant 3: The security tab feature.

Means: No, that's available for any repository at GitHub.

Participant 3: I've accessed my repository, I'm not able to see that.

Means: Really? Is it a public repository or private?

Participant 3: It's a private.

Means: What level?

Participant 3: Enterprise.

Means: Board level. They're not available for intersource yet.

Participant 4: Question about the CVE reporting. Is there a level of verification that happens once the CVE is reported, like on the GitHub side to verify that?

Means: When you become a CNA, there's a whole list of rules that you have to follow and who you can issue a CVE to and who you can't. That goes through our curation team and if there's a data quality issue or there's same multiple vulnerabilities in one report, we'll work with the maintainer to get that separated out so that we can issue CVEs.

Participant 4: Then the second follow up question is, you mentioned that the CVE is a report for everything that are not in other CNA. Is that a burden on the reporter or, how does that process will happen if you could talk more about it in detail?

Means: That's actually something that we're looking to address ideally. For an organization like Google, we want you to be able to configure your CNA so that when you went through this process, you could get a CVE from your CNA or that we would direct you to your CNA versus letting you get one from us. Right now, when we get a report in, we essentially look over the CNA list and make sure it's not covered by somebody else before we issue the CVE. It's a manual process right now.

Participant 5: The other product which you just showed, Semmle, is that available now?

Means: It's available for purchase. We haven't integrated into the product yet. That will happen over the months to come.

Participant 6: If my library is using another library that has a vulnerability, would it get an active alert other than just going to the GitHub website or do I have to check it on my GitHub page as well?

Means: We'd email it to you. You would get an email alert about that.

Participant 6: Ok. The second part is, is the bot already done with the PR?

Means: What do you mean?

Participant 6: The one that was running in the background. The autobot that's created the vulnerability that you fixed.

Means: Yes. It already created the PR and we merged it in.

 

See more presentations with transcripts

 

Recorded at:

Dec 09, 2019

BT