BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Securing the Software Supply Chain: How in-toto and TUF Work Together to Combat Supply Chain Attacks

Securing the Software Supply Chain: How in-toto and TUF Work Together to Combat Supply Chain Attacks

Bookmarks
35:07

Summary

Marina Moore covers the fundamentals of both in-toto and TUF, and discusses how to combine them with a real world case study where Datadog has been using two technologies together.

Bio

Marina Moore is a PhD candidate at NYU Tandon’s Secure Systems Lab doing research focused on secure software updates and software supply chain security. She is a maintainer of many open source projects including The Update Framework (TUF), Uptane, in-toto, and Sigstore. She also is a Tech Lead for the CNCF's TAG Security.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Moore: My name is Marina. I'm a PhD candidate at NYU. We're really focused on this space combining some research ideas with actual practical implementations that can be used. That's what we're going to talk about. There's some theory, but mostly this is about how we actually use this stuff in practice. I'm going to focus on two tools. The overall idea is really how to combine these tools into a secure software supply chain.

What Are Software Supply Chain Attacks?

Software supply chain attacks, what are we actually talking about when we talk about these? First, we have a software supply chain. In order to attack something, you'd have to first define what it is that's being attacked. This is one definition from Purdue where it defines the software supply chain as a collection of systems, devices, and people, which produce a final software product. This is basically everything that happens between when some developers write some code and when that code is actually run in a production system. An attack on the software supply chain is when one or more weaknesses in the components of the software supply chain are compromised to introduce alterations into the final software product. This is when anything happens in that chain that's unexpected, or that changes some stuff in some way, especially in a way that can cause the final product to be vulnerable, maybe have some arbitrary software in it. These attacks are very common. From the 2022 Sonatype report, over the past 3 years, we've seen an increase of 700 or so percent in the number of supply chain attacks that are seen in the wild. This is a real problem. It's happening. Here's a few examples, they happen on all different pieces of the supply chain, like package managers, they happen in the source code area, in updates and distribution. All over the place. The CNCF, the Cloud Native Computing Foundation has a great catalog of a bunch of these different types of attacks that puts them in different categories. It's not every attack that ever happens, but it's a nice overview for folks new to this space, if you're interested in learning more about what attacks are happening. This is just the ones that are on open source projects, which means they're publishable and easy to find. It's a great database.

Solutions

There are a lot of solutions that have been proposed in this space. I think we've heard about some of them. Because it's such a big growing problem, I think there's a lot of work happening. There's a lot of good work but it all solves different pieces of the problem. In some ways, as in everything in cybersecurity, if you only solve one piece of the problem, the attackers just move to the place you didn't solve. You really have to cohesively think about the system. That's what we're going to try and do. We're going to broadly categorize these solutions in three different areas and then talk about how these come together. The first is evidence gathering. This is looking at what's happening in the supply chain, gathering evidence about what should be going on. Information discovering is just looking at this evidence and trying to learn stuff about what's happening in the system. Then, finally, we have policy and validation. This is, in some ways, the most important one, where you not only say we have a bunch of metadata, we have a bunch of information about what's happening in the supply chain, but we also want to make sure that x happened and that y performed z, the exact stuff that needs to happen.

Common Link: in-toto

The first project we're going to talk about is this project called in-toto. We like to think of it as a common link in the supply chain where you can really tie together a bunch of what we call point solutions, solutions that solve one piece or another in a more cohesive way. It's actually implemented by other projects as a common data format even to communicate these different things. First of all, we have the evidence gathering. You have things like SLSA, as well as the various SBOM formats, the software bill of material format CycloneDX, SPDX. All of these things provide some metadata that provides information about stuff that happened at some point. An SBOM is the dependencies that you're pulling in. SLSA talks about the information, like what happened at the build step, and so on. This is the evidence gathering piece. You would put that information into in-toto, in what we call in-toto links. It's just a common format for the information. It's the same information just transposed.

Then you can send that information back to information discovery systems. These are systems like Sigstore, which allows for the Rekor transparency log which has a big graph information, which is queryable. Things like GUAC, which is a project that looks at visualizing stuff that's happening in your software supply chain by taking in lots of different types of metadata, as well as other projects in this space. Then, finally, we have the policy and validation, what in-toto, we call the layout. It lays out the steps, which can easily be thought of as policy as well. You can take all this information. Then you have what we call a supply chain orchestrator who decides on some policy that should actually be happening in the supply chain, writes up this policy. You could use this in any existing policy engine, any admission controller, anywhere where you're pulling stuff in. You then define what should be happening. You can then compare this to what actually happens in those different links, a couple steps ago, which I think I have in this step. You have these links. You have the stuff that actually happened in your supply chain. You have the layout, which is the stuff that you want to happen. Then as an aside, you have some analysis of what's happening and what could be done better, which you can use to iterate on the layout. When you put that together with the image, you can get this attested final product. Each of those different steps in the supply chain, they contain not just the steps that happened, but the outputs of the steps. That's a cryptographic hash, which is then signed by the step, which means that you can actually look that the output of one step is the same as the input to the next step. You can make sure that no tampering happened in between these steps. In the layout, you can enforce that you check these actors did stuff here, and the output from that was then inputted here, and so on.

What's Missing: Distribution

One missing piece in this picture that I quickly summarized is the problem of distribution. You need to actually distribute these three things, the package, the policy, and the attestations to the user. Most importantly, you have to distribute this policy. If an attacker is able to compromise the policy of what should be done in the supply chain, then they're able to compromise any piece of the supply chain, by just changing that policy.

How Do We Distribute in-toto Metadata?

That's going to lead us into the next project that I'm going to talk about a bit. First, I'm going to talk a bit about the properties we need from secure distribution of in-toto metadata. We need to make sure that this information is timely. We talked a bit before about how this policy or this layout can iterate over time. Maybe you're iteratively improving the security of your pipeline. You want to make sure that even if a policy was valid today, and then you change your process from policy A to policy B, you want to make sure that people in the future will only validate against policy B, even though you previously signed policy A. You also have to make sure that these policies are coming from trusted users. This is especially important because the policy actually defines the users that will be trusted for the different steps of the supply chain. You can build trust, but you have to start with some point of trust. Finally, this has to be compromise resilient. If this becomes the single point of failure and the place to attack in a software supply chain system, then that will happen. We've seen really large motivated attackers in this space. I think the SUNBURST SolarWinds attack is a great example there. You have to make sure that even if one thing goes wrong, you can either recover or prevent that from causing a full breakdown of the security properties.

The Update Framework (TUF)

This is where we're going to come into this project called The Update Framework, or TUF. This is a framework for secure software updates, and really for secure distribution, which is applied to updates. That really was built with this idea of compromise resilience, and revocation really built in from the ground up. It assumes that repositories that host software as well as keys or developer accounts can and will be compromised, and so it provides means to both mitigate the impact of the compromise and allow for secure recovery after a compromise takes place. To provide a graceful degradation of security, the more stuff that's compromised. Obviously, if your whole system is compromised, stuff can still go wrong, but each individual component has a minimized impact.

TUF Design Principles

It does so through the use of four design principles. These are responsibility separation, multi-signature trust, explicit and implicit revocation, and minimizing individual key and role risk. To start, we have this idea of responsibility separation, which comes back to this idea of delegation. I think this was a big thing in the keynote as well, this idea that you start with a root of trust, and you delegate down from there. TUF uses that property to divide responsibilities into these different roles, delegated from a root of trust. By minimizing the scope of the root of trust itself, passing most of the day-to-day use down, you can actually utilize any keys involved in the root of trust less often, which means they can be more securely used. It allows for bigger hardening because you're using this less. The more something is used, the less power it's given. For example, two of the different roles in TUF provide content attestations, or information about the actual content of a package or a software update. While a different role in TUF provides information about timeliness, to make sure that you're using the current version of a package or the current version of a policy, as we were talking about before, by providing this kind of time key. I'll show you the exact mechanisms and how we apply those. We basically separate the different roles, and using this idea of delegations. We have one role, it's responsible for the integrity of packages, but that role can further delegate. I specifically want the alpha projects to be signed by Bob, and the prod projects be signed by Charlie. If Bob signs alpha, everything is fine, but if he signs prod, then that will be rejected, because he's specifically trusted for alpha. It's minimizing the scope of a compromise of Bob's key.

Next, we have minimizing individual key and role risk. This goes back to that idea of more or less protected keys. For example, the root is a really high impact role, which means we need highly secure keys, which means these keys will be harder to use. If you have the root role require multiple signatures from multiple well-secured keys, say these are YubiKeys that are stored in a lockbox somewhere. I think for one project that uses TUF, we have five trusted root signers requiring a threshold of three of them to sign it, and they're distributed across different continents. These keys are stored in safe places. It basically requires a whole Ocean's Eleven movie to actually compromise this root of trust. It also means that it requires a week of planning to actually do a signing of it with these people across three continents. Versus, you have lower impact roles which you can sign with online keys, which are much easier to use. You can do on-demand signing. You can change things every day, or every hour. By necessity, these keys are less secure, because you can't have five people across continents pushing a button every minute, if you need something to change every minute. By creating this set of delegation, so you start at the top, with the highly secured, hard to use, but very secure keys, all the way down to these online, really easy to use, but slightly less secure because if a server is compromised, all the online keys on that server will be compromised alongside it.

Next, we have the principle of multi-signature trust. This is just the idea that you require multiple signatures on a particular role or a particular piece of metadata, so that it's not just one key that has to be compromised, it's like two or five or whatever the numbers. One key compromised isn't enough. Finally, we have explicit and implicit revocation. Implicit revocation is just timeouts. That's pretty straightforward. If stuff expires, then it's implicitly revoked. The explicit revocation was really a key design principle of TUF that ensures that anyone higher in the delegation chain can explicitly revoke anything lower in the delegation chain. If there's online keys which are used very frequently, happen to be compromised, anything above them in the delegation chain can immediately revoke it and that will be seen by everybody because of this timeliness property of TUF. Again, I'll explain how. This is the why portion of the talk.

The Targets, Snapshot, and Roots Roles

Now we're going to get into the how. That's a good transition. We're going to build the actual architecture of TUF starting from the packages. If you look at the far right of the slide, you see the actual packages that you're distributing. These don't have to be packages. They can really be anything you want to securely distribute. For now, we'll talk about the use case of, you have some built packages, and you're going to get them to some end users. You have three packages that you're trying to distribute. The first thing we're going to add is targets metadata. Targets role in TUF is responsible for the integrity of files. This is I think the classic image signing, the first thing you think of when you think of, how do you securely distribute something? You have someone sign it. This targets real signed stuff, but not just the targets role itself, but the roles that it delegates to side stuff. You can have an offline, well-secured targets role that says, all A packages are trusted in this direction, all B and C packages are trusted over here. Then the B, C role actually signs those packages. Then, for some reason, A also has a further delegation. This could go on. Especially if you have a big organization, you don't have to share keys across the organization, you can just say, ok, this team has this key, they'll use that. This team over on this other part of the org has a different key, and prevents key sharing across these different people.

Next, we're going to add the snapshot role, which provides a sense of consistency of all the different metadata that's on the repository. This will be important because of the next role I'll introduce, but basically it makes sure that you have a list of all the images that are currently valid, which means that you can hash that and have a timestamp on it, which gives you that time in this property. If you check the timestamp, which has a hash of the snapshot in it, you can make sure that any package you're downloading is the one that's currently valid today, and not one that was valid at some different point in time. Then, finally, of course, we have the root of trust, which provides the keys that should be used by all these other top-level roles, as well as a self-updating feature for the role itself.

ITE-2: Combining TUF and in-toto

What does it look like when we actually combine these two pieces of technology? This is part of this goal of getting end-to-end software supply chain integrity. You have to protect the whole system, not just pieces of it. You have TUF which can securely distribute not just those packages that we talked about here, but you can actually put in that right-hand side of the image, there was in-toto layouts that you want to securely distribute, as well as the attestations, and all the other pieces of metadata that you need. Then you can have these layouts distributed from secure roles with high assurance. From the secure targets role, you can delegate a specific high assurance role that's used for these layouts, so the policies are only signed by that high assurance role. These other roles which are used more often don't have permission to sign them. Here's a nice diagram of this in practice. This has actually been implemented at Datadog in practice for the past 5 years or so. There's also an ongoing integration at an IoT company, Torizon, which has some interesting scalability properties as well. If you look at this picture of how these things fit together, in the top left over here, we have those TUF roles. This looks a lot like that picture I showed earlier. That's just what you saw there. The main difference is what this targets is pointing to. The targets metadata has different signers, which sign the actual packages in the in-toto metadata. Then they have a direct delegation to the layouts and the policy pieces of it. That's a more direct link, because it's signed with those offline keys used in the targets, versus these ones down here used every day to sign new packages, used more frequently which makes them necessarily slightly less secure. Those are just separated. If anything goes wrong, you can obviously revoke it and all of those things.

Demo

We're going to start out with a TUF repository. These are the roles that we saw in TUF, we have the root snapshot, targets timestamp. This is all generated using an open source tool called the Repository Service for TUF, which is a new OpenSSF project, which basically is working on the usability of spinning up these TUF repositories. We use that for the demo. These bins roles are delegated targets roles. They're just done in an automated way. This is really useful for certain applications, but it's just done by default in this implementation. Going into each of these, we have the root.json, which contains, as you can see all those roles I talked about before. It has those delegations to the snapshot, the timestamp listed in here with trusted keys, and all of that. Then we have the targets, which currently just includes these succinct delegations. This is just an automated delegation format to those bins on the repository. Timestamp, it has the hash of the snapshot, like I mentioned. These bins are currently empty, this is just the starting state of the demo. As you can see, there's no targets listed.

Now we're going to actually do something interesting. We have Alice, who is the supply chain owner, who's going to define this policy. She's going to define an in-toto layout that defines everything that should be done in our supply chain. She's defined the TUF. Then we have this create step, a build step, what should happen there, the expected outputs. The fact that Alice should sign the result of the build, and then an inspection, which is basically just a comparison to make sure that the output matches. This is our initial state of the supply chain. This is a format that's defined by in-toto, but it's hopefully readable by humans. Then we're going to upload all of those to our TUF repository. Now, if we refresh this, then the next version of the metadata of these targets includes the layout signed by Alice's public key, the root.layout. In the other bin, we have her actual public key, because you have to distribute not just the layout, but also who signed the layout, because you don't know a priori what Alice's public key is. This is where you're getting that is through another one of these TUF delegations. Alice created the project. She's actually doing the supply chain. As we go along the supply chain, she's creating these links, these attestations to the different steps that are done. Now building it. Creating all that metadata. Then, we're going to see in the repository, once she uploads it. All that metadata is now going to appear in these delegations in the next round. Now there's a lot more stuff going on here, there's a lot more targets. We have the links. Each of the links are included as their own artifact, as well as here we have the wheel, which is the Python final artifact that will be distributed. We still have that root layout, and the key, and another link. In practice, again, there would probably be an offline role signed with more secure keys for specifically the layout and the public key of Alice. For the sake of the demo, these are all combined, because we're doing it all online. Again, in practice, you probably separate that out a little bit more.

Now we have the client who's actually verifying that everything happened. What the client does is all they have to do is look at this in-toto layout, which is defined by Alice, and make sure that everything that was defined in that layout actually happened by the expected actors. This is doing in verbose mode, and so you can see all the different verifications steps that happen. It makes sure that all the steps happen. That passed because Alice did in fact make this project, and we printed out our hello world example.

Let's make this a little bit more interesting. What if we then change the layout so that it's not just Alice building this project all by herself. We're going to clear out her state a bit, get rid of all those targets that were in there. This is just to show you that it's empty now. Then we're going to reupload the layout.

This is a new layout. This is the second layout. We now have a new key here. We have a Bob role that was defined. Then in addition, we have this update step, which was added to the supply chain. Not only was the project created, it can now be updated by somebody else before it's then built and sent out. Bob has permission for this update step. As you can see, Alice now generated the project. This is the video just uploading the layout, as before, in the public key. Then, Bob is going to pull the project and make some changes to it, which is allowed again by the supply chain steps. Then he's going to change it and upload the links about what exactly changes were made. Then Alice is going to build it and upload it as before, getting all that metadata pushed to the repository. Now we can see again, we have the links. We'll have the final artifact in here as well.

This is actually interesting too, the way you actually link these things together. When you actually update the wheel, you have to know which layout is associated with it. This is our method in the metadata for linking things together. There's a bunch of little annotations, and they're all specified if you're interested in learning more there. Basically, all the different things are linked together. In this demo, we just have one layout for one project. Of course, in practice, you have 100 layouts for 100 projects, and so you can tie this all together and include them all in the same repository. The client now downloaded it, bin all that verification again. As you can see, it's all binned and passed. That's all the steps that happened. Now we see that it's the same project, but Bob was also here.

Now we have the really fun part or dangerous part, depending, is now we have an adversary who tries to tamper with the supply chain. This is an adversary who does not have access to either Alice or Bob's private keys. There's I think some details about which attacks can and can't be caught, as always, but in this example, yes, the adversary does not have access to Alice's private key. The attacker built the project, they were able to upload it to the repository, but they didn't have the proper keys. As you can see, if you look in this one the attacker uploaded, you can see the hash of this one. This is the hash of the wheel that was uploaded, which of course, that's OFO, she's going to, of course, be different than the VE6, which was the valid one from the valid supply chain owners. Basically, this is the malicious one. The verification failed, because this was an improperly signed metadata file. The first step succeeded, but then you realize that this was in the disallow rule for this step. We're going to force download it because this client decides they want to run the code anyway, even though they're unsecure, and you see the evil change. This is the bad one, this is the one that we did not verify.

Summary

Basically, the point there is that you can put these two things together. We're building this tooling to really make this easy, because I think each of these different pieces solves a piece of the problem, but you really need to put it together to build this end-to-end work. It's the main goal, particularly there. You can use TUF to distribute in-toto metadata, as well as the actual artifacts themselves to get end-to-end software supply chain integrity. You're tying all these steps together. You'd have the output of one go into the input to the next. You could have this layout signed very securely with these offline TUF targets keys to add a layer of compromise resilience, as well as the verification properties that are present in TUF. We're preventing replay attacks as well, so we can't use old policies or old attestations.

I think we have a couple of places we're already building this in practice. This is designed to be used in practice. It's used by Datadog. There's an ongoing integration with Toradex, which is an IoT manufacturer. Also as well, we're working with some of the open source communities, folks like the Python Packaging Index, RubyGems, and so on, about how to do all this stuff. All of this work is open source, academic, so we're able to collaborate openly, which is always fun.

Software Supply Chain Security and Web Systems

Right now, we focus a lot on software distribution, supply chain security aspect of this. I think it always has been interesting to look at this in comparison and around the PKI web systems, because I do think that one of the big things is that this area's a lot more distributed. You can actually download your software from a lot more people than you have web browsers, for example. This idea of roots of trust, you end up having to have a bunch of them. Whereas something like the web, you have actually much fewer of them, even though you still have this large collection of CAs that you trust. I do think that there are interesting applications of this, in that space. I haven't investigated particularly how this would apply all the way down to the browser. I do think this idea of root of trusts is fundamental to, why do you trust different things? How do you know who you're trusting? How do you know why you trust them? All of those pieces. It can be applied there. We haven't done that. Definitely something that's interesting to look at.

Questions and Answers

Participant: I think that cryptographic hashes are a pretty much outside mechanism for proving exactly what you mentioned, that an artifact is the same as it was 10 years ago, apart from hash provisions and quantum computers.

Moore: The quantum computer is different.

Participant: That was a thing before ChatGPT was released, so it's been released now, what was it generating?

Participant: I can see how this mechanism created a secure supply chain in a somewhat isolated environment. What about the external components of your supply chain, because a lot of repositories that distribute packages don't use signing at the moment. How do you incorporate the nasty outside world into your lovely secure supply chain?

Moore: One of the things about in-toto is that it's fairly unopinionated. in-toto allows for really strict, strong requirements that says every single thing has to be signed, and this thing has to exactly lead to that thing, which of course is the goal we're all working towards. It's aware that there are steps that maybe differ. We don't live in a world, for example, with reproducible builds, and so you have to just trust the build system to do what it's going to do. Because if you build something twice in two different computers, the hash output will be different. It's aware of these kind of limitations in existing systems. Basically, it's just by defining a layout that says, yes, we know we're pulling in some untested dependencies today, and then maybe sometime down the line you can update that layout to say, no, every dependency that's pulled in should be verified by an engineer, or whatever the process is.

Participant: npm published a worrying statistic in 2020 that only 8% of maintainers use two-factor authentication, the other 92% use user name and password. Most people dislike passwords, so getting into that supply chain is not that hard.

Moore: A lot of those package repositories are working on improving at least the 2FA piece. Even then you have things like typosquatting. People are just pulling random code.

Participant: We're almost to that point, as you were showing us the Datadog example, it seemed like they were building their own wheels because they weren't prepared to trust the ones that [inaudible 00:34:42].

Moore: I think that's the reality today is that, you should know if something comes from the source, you have to do it yourself. I think that the open source community is working towards a better world, but I don't think we're there yet, just quite.

 

See more presentations with transcripts

 

Recorded at:

Nov 28, 2023

BT