InfoQ Homepage Presentations Small Is Beautiful: How to Improve Security by Maintaining Less Code

Small Is Beautiful: How to Improve Security by Maintaining Less Code

Bookmarks

View Presentation

Speed:

Download

47:43

Summary

Natalie Silvanovich explains several causes of unnecessary attack surfaces and how to avoid them. The presentation includes examples of vulnerabilities reported by Project Zero and explains how developers can prevent similar bugs.

Bio

Natalie Silvanovich is a security researcher on Google Project Zero. Her current focus is browser security, including script engines, WebAssembly and WebRTC. Previously, she worked in mobile security on the Android Security Team at Google and as a team lead of the Security Research Group at BlackBerry.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Silvanovich: Today I'm going to give a talk entitled "Small is Beautiful, How to Improve Security by Maintaining Less Code." I'm Natalie Silvanovich, and I'm on a team at Google called Project Zero. Project Zero's goal is to reduce the number of zero-day vulnerabilities available to attackers in the wild. We do this from a user-focused perspective, so we look at all products, not just Google products.

Attack Surface Reduction

Most of our work consists of finding vulnerabilities in these products and reporting them so they're no longer available to attackers because they've been fixed. Recently we filed our 1500th vulnerability, which is in the five years that our team has been around. In finding all of these vulnerabilities, we've noticed that a lot of these vulnerabilities have things in common. Specifically, a lot of them are due to needless attack surface.

The attack surface of a piece of software is basically everything that an attacker can manipulate, everything that an attacker could find bugs in, and obviously the less of this you have, the better. We've all heard the joke, it's not a bug, it's a feature. Bugs and features are intertwined I think in a more important way. Every bug started its life as a feature, and every feature introduces the risk of creating new bugs.

What attack surface reduction really is, is making sure that this tradeoff is worth it, because a lot of bugs we find are due to code that, this decision wasn't made. It's code that basically provides no benefit to users, and yet is there creating security risk. Specifically, what we've seen is unused features, old features, code sharing, third party code, and excessive SKUs and branching. That's what I'll talk about today.

Finally, I'm going to talk a little bit about privileged reduction and sandboxing. I don't consider this to be a strictly attack surface reduction, but it's something you can do if you aren't able to reduce the attack surface.

Unused Features

To start off, I'll talk bout unused features. This slide is, unfortunately, missing a cartoon. They wanted $15 per presentation to license it, and that's too much. You have to pretend that I just said something really witty about product managers putting way too many features into products.

With unused features, all code has risk. All code risks introducing security problems and other bugs into your software, so it's important to make sure this tradeoff is worth it. Obviously, if a feature isn't used, it's not. It's important to get rid of these.

To start off with an example, JavaScript. Who here has written a few lines of JavaScript or at least complained about it? Most people in here. Now, who here has heard of this feature, array symbol species, the code in the middle? Show of hands. Has no one heard of this feature? I was going to ask if anyone had put it into serious software, but I guess clearly not if no one has even heard of it. What this feature does is, in JavaScript, you can do lots of things to an array. For example, you can slice an array. What that'll do is it'll make a subarray of the array. Then there's this question. Let's say you have a subclass of an array and do you slice it? Do you get an array back or do you get the subclass of the array back?

The JavaScript standard says, "Why don't we do both?" There is array symbol species. This returns the constructor that is used to create the array that is [inaudible 00:04:11] by any function on an array. This is crazy, no one uses it, and I have a sarcastic point at the end. This was also very difficult to implement and introduced a large number of bugs.

Here's a bug that I found that was due to array symbol species, and it is in Chakra, which means it was an Explorer Edge bug. Here's the code that causes this bug. The way this works is, I'm a malicious person, I create a webpage containing this code and then I convince someone to visit the page and then that compromises their browser. Here is the code. You start off at the bottom, you're creating the MyArray, and what's important here is this is a mix type array. You've got objects and arrays and my name and numbers in there. Then you call filter on it. That's the very bottom. What filter does to an array is it will run a test on every element of an array and then put that element into a new array if it passes. In this case, it always passes, so it should just copy the array into a new array.

Of course, it's the subclass of an array, so you get to that question, is the thing it returns an array or a my array? Then you call array symbol species. Then this returns this class dummy, which returns this other array, which is actually what this array is going to be copied into.

This turns out, in the JavaScript engine, both being a type confusing bug, because the engine assumes that this is going to be a mix type array that it's copying into, but it's not, and this also ends up being a memory corruption vulnerability because this array is too short, so it also writes off the end of the memory. This is the actual code from the script engine. The interesting part is there's the array species create, and then you can see that it basically just copies into it without checking anything.

This is just one of I think about 15 or even 20 vulnerabilities involving this array symbol species part of JavaScript that have been reported, which is problematic because I did a scan of the internet. I think there are roughly 150k pages that use this, which is not very much, especially if you consider that a lot of them are JavaScript tutorials. Also, this is from a marker in Chrome, you can add to Chrome basically a counter that for beta users will count how many times a feature is used, and based on that, 0.0015% of all page loads use array symbol species. This is minuscule. This is an example of a feature that I think is way more risk than the benefit it provides.

To give another example of a bug like this, this was recently filed by someone on my team, Jann Horn, and this is a bug in IP compression. This is part of the IP basic standard, and it allows IP packets to be compressed. On a Mac, there was a memory corruption vulnerability, and typically these types of vulnerabilities can allow an attacker to execute code in the context of the software where the bug is. Anyhow, this was in this IP comp feature which was hardly ever used if you look at traffic on the internet and broken on Mac OS, so it didn't even work correctly. That meant it was very unlikely that any user is reusing it. They fixed it by just turning this off.

Those were two examples of unused features. You might wonder, why does this happen? Why do things like this stay in software? I have a quote from Tumblr on a picture of a boat to answer that, which is that in software, once you put a feature in, it can be very difficult to go back.

There's ways you can deal with this. Lots of software will put in experimental features. For example, this is a flag from Chrome saying that this feature is experimental. They will often do this, and they do it in Firefox too, to just test if a feature gets uptake. Then they can take it back if it doesn't and it turns out to be too risky. Of course, this doesn't actually solve the problem that if you have that tiny number of users, if you remove the feature you might break people, but at least I guess those people have been warned. Something I find encouraging is I've started to see an increasing willingness among vendors to remove features where there's that very low usage and tons of vulnerabilities. This is, for example, from the Blink mailing list, which is HTML for Chrome. They're removing these HTML features because they have tiny usage, like 0.01% of all page views, and they caused just tons and tons of bugs. Here's an example from Microsoft Office. In this case, they're disabling one format that Microsoft Word can open, and once again it had this problem where there was just a tiny number of limited users and a huge number of bugs, so they just turned it off. Likewise, they disabled VBScript in Internet Explorer for the same reasons.

How do you avoid this problem? Base features on user need. That sounds silly, but it's important to make sure that, especially if you know a feature is going to be really complex, really likely to have bugs, that there's enough usage of it that it's worth it. It's also a good idea to track feature use in beta or production if it's possible, because some of the worst situations I've seen is where companies have just thrown something out the door with no way to know if anyone's using it, and then it becomes very difficult to take features back because you have no way to know who is using it, is it a large number, is it a very important customer, that sort of thing. Tracking is good here.

It's also good to be willing and able to disable features. These are two different things. I've certainly encountered vendors who have told me, "We will never turn off a single feature if a single user is using it." While this is an opinion, I think that it's important to at least consider turning off features if they have very small usage if the cost of this is protecting all your customers. Like with the array symbol species, a very small number of people are actually using it, but the security impact actually affects everyone. Everyone could be tricked into visiting that malicious website, even if they never use any websites that legitimately use this feature.

It's also important to be able to disable features. Occasionally I run into situations where vendors will want to turn stuff off. Then when they try they realize it's too inextricably linked to other stuff. It's just really difficult.

There's a lot of cases for modular code, but security is one of them. To give an example, does everyone remember that bad FaceTime bug where someone could add someone to a group call and then if they added themselves again, it would just pick up that group call and then you could just hear on the other end? What Apple did temporarily was they turned group FaceTime off, which meant that people could still use FaceTime for a lot of things and not be subject to the bug. Imagine if they didn't have modular code. Then they might have to take down all of FaceTime. This is how a modular code can help you. It means that, in an emergency, you can turn features off.

This is also, I think, from Tumblr. It's never too late to do things differently, and palm trees.

Old Features

Even though we've seen a lot of bugs due to unused features, what's more common is that there are features that were useful, but aren't anymore. They fell slowly into disuse as things have become more modern.

This can lead to bugs because, to start off, you have the risk-benefit things. If there's no users, it's just all harm to your users due to the feature being enabled, but also this code can be higher risk. To start off, old code was also often written before what I would call the modern security environment, before we knew all the types of bugs that could cause security problems. Old code, it's not uncommon that it will have bugs you wouldn't expect to find these days in it. Also, the lack of attention to this code can make it higher risk. People aren't going to notice necessarily if it has bad bugs in it.

Here's an example of a bug due to old code. I found this vulnerability in Adobe Flash in 2017. It was a use after free, a dangling reference, like that Mac issue. I found out initially that it was a result of a hack made to Adobe Flash to load macromedia.com in 2003. Does that website still exist? No, and it hasn't existed for many years and yet this hack kicked around. There was a vulnerability for 14 years. This is an example. Temporary hacks, if you don't remove them, can introduce bugs that don't need to be there.

Here's another example, and this was also found by Jann Horn in Virtual Box. This is basically a virtual machine and its guest host escalation. It was a way to break out of the virtual machine. It had two root causes. One was that old code wasn't fully removed and the other was that it was fixed in upstream but not downstream, and I'll talk about that later.

Here's the source of the bug. You'll notice the most important part is to-do, remove this code, it's no longer relevant. Unfortunately, this didn't get to-done, and this remained a vulnerability for quite awhile. This is a case for prioritizing removing this codex because if you don't remove code that's unused, you're risking vulnerabilities or other bugs will occur due to it.

Here's another example. I found this bug recently in iMessage, and it's Mac only. It's a bug where you send an iMessage to another Mac and it causes memory corruption immediately with no user interaction. We actually exploited one issue like this so it can be used to get arbitrary code execution on another device without touching it. A very serious vulnerability.

This one occurred in deserializing a URL that was sent in that message, and it turned out on a Mac when you deserialize a URL, it can process a bookmark format from 2011, so eight years old. This was fixed by removing the ability to parse that format, but I think if it had been removed a bit earlier when this format wasn't used anymore, we wouldn't have found that bug.

What can you do? Yet another case for a tracking feature use, getting rid of the ones that are no longer used, if you can't track feature used, sometimes it can be good to run a content stats. To give an example for browsers, there are some good ways to track if things are being used. Another good thing is to scan the internet and see what's actually being used. That is almost as good of a statistic.

It's also a good idea to compare usage to reported security issues. For example, I used to be on the Android security team and when people externally reported bugs, there were a few buckets of specific features that had a lot of bugs. Those can be good candidates to have their attack surface reduced or changed in some way.

It's also a good idea to prune your code trees regularly. Look for code that hasn't been modified in a while and figure out why. Figure out if anyone is still actually using it. It can also be helpful to refactor older code. This will reduce some of the risk of just code being less secure because people didn't know as much about security back then.

Also, it's important to make sure that all code has an owner. There's two reasons for this. First, we definitely reported vulnerabilities where the code has no owner and it takes a really long time to fix because the vendor can't figure out how to get the specific component fixed or who knows anything about it. That's bad if someone like my team reports a vulnerability, but it's even worse if this happens, there's an attacker in the wild using this and you can't find who knows anything about this and can fix it.

It's also true that if every piece of code has an owner, there's someone who wants to get rid of that code and no longer own it. That can hasten the deprecation process quite a bit.

Another thing that leads to unnecessary attack surface is code sharing, and there's two sides here. On one hand, if you use code for multiple purchases too much, especially if the code is used for something that doesn't have a lot of privileges and then something that is very security-sensitive, you can have this problem where someone will add in a feature for the less sensitive stuff and it will cause a huge security risk for the more sensitive stuff.

Sometimes code sharing can lead to extra attack surface, but on the flip side sometimes there are too many copies of a piece of code, and then it leads to stuff being difficult to maintain and risk of bugs not getting fixed.

Here's an example of some bugs we filed a few years ago on the Samsung S6 Edge device. There were memory corruption issues due to image processing. You would download an image from the internet and it would hit this bug, and once again cause memory corruption. We found out it was due to bugs in a codec called QJpeg by a company called Quorum. Here is Quorum's website, and you'll notice they have a long and illustrious history that ends in 2008.

I thought this was a case of the unused code. Then I looked into it, and it turns out that this image decoder is actually used for one thing. When you start up your device and it sings and it has the carrier logo that it displays, it uses this codec. Nothing else. Unfortunately, the way this was implemented is it was just plonked into the Android image subsystem. Then it was available everywhere for images from the internet, images in the gallery, that sort of thing. This is an example of, they shared this code really broadly when it didn't need to be. If they had just limited it to loading that one image on startup, this wouldn't even be a security bug because if you have enough access to a device, you can swap out that image. You can already do a lot of stuff on that device. Now this code was moved so it was used to process stuff that's completely untrusted off the internet, and that's what caused the problem.

To give a flip side, this is a case of too many copies of code. When I was on the Android team, which was about five years ago now, there were many copies of the web view, which is the code that processes HTML and JavaScript on the Android devices and many features copied this code, and sometimes there would be bugs in all of the code and they would be fixed in one version but not in another. Every time there was a bug that was reported, there was so much work to fix because there were so many different versions.

One thing we did is we moved to a unified web view so that now the code can be updated in one place and it's much easier to fix vulnerabilities.

Here's another context issue, and this happened in some iMessage vulnerabilities that I reported recently. These ones actually work on iPhone and Mac, and we've exploited one of these. These are basically once again remote issues, don't touch the target device and code execution. All of these bugs happened due to deserialization in iMessage. The problem was that there's this class that's used for deserialization basically everywhere on Mac and IOS systems, and one of their features is that when you decode a class like an array, it will also decode the subclass.

This is useful in a lot of contexts, especially local contexts. In the remote contexts, it was a lot of attack surfaces. It meant that you could deserialize all these classes that weren't necessary, and basically all these bugs were in these subclasses.

Apple fixed these issues, but they also added a serialization mode where you could just decode the class and not the subclasses. If this had been reduced before we started looking for these bugs, we would have found zero bugs. All of these bugs would have been prevented by reducing the attack surface, which, specifically in the context of iMessage, didn't add any features that were useful to users.

How do you prevent this? Make sure every attack surface supports only the needed features and be very careful about that situation where you have the code that is used for the low privileged context and the very sensitive context, because you need a way to make sure that people aren't putting stuff into the component that's not very sensitive, not realizing that it's also being put into this very sensitive component. One easy way to do that is to split the code. There's other ways like Code Review, but it's important to think about.

Also, avoid multiple copies of the same library. This is a situation where you're just creating more risk of bugs usually with no benefit to users.

Third-Party Code

Now I'm going to talk a little bit about third party code. Third-party code is a frequent cause of unnecessary attack surface. There's lots of ways it can go wrong. It can be straight out misused, it can support extra features, it cannot be updated and it can interact with stuff in unexpected ways.

Let me give some examples of all of these. To start off, straight off misuse. This is a vulnerability that [inaudible 00:22:58] and I found in the FireEye Malware protection system. This is a device that will just sit on your network and scan stuff that's coming in over the network and see whether it's malicious, see whether it's a known virus, that sort of thing. One thing that this will do is, if you send a JAR file over the network, it will decompile it and make sure that it's safe. The way the system did it is they used a third-party library called Jode, and it will decompile the stuff on the network with it. We contacted the developer of this component and he told us, "You shouldn't do that. It executes the code while it decompiles it."

This vulnerability was pretty basic. We sent in the JAR that was decompiled using this feature, and yes, it did, in fact, execute our code in this appliance. Anyhow, this a situation of plug and play. Find the thing that does what you need, put it on without thinking about security, and that can lead to all sorts of security issues including stuff like this.

Now here's an example of something being done right. Another risk with third-party software is it has all these features you don't use, because third party software is never made exactly for your application. There's always extra stuff in there. In this case, there was a memory corruption vulnerability in Linux, and it didn't affect Android because Linux supports flags that you can use to turn off features. Android had actually set these flags.

This is good design from the Linux perspective in that they make it so that people who use Linux can turn different features off and it's good design from the Android perspective because they actually made a point of turning off everything they weren't using.

Here's a more complicated example of isolating yourself from third party software. James Forshaw and my team worked with Adobe to do what he called win32Klcodkwon. The idea is that for a while there were vulnerabilities in Adobe Flash all the time. They would lead to arbitrary code execution in the context of the browser. To actually start accessing interesting data, you need to break out of the browser sandbox and access the OS.

A common way this was done on Windows was using an API called win32k. It was fairly old and had a lot of vulnerabilities in it. What James did is, on Windows there's a way that you can compile a binary so that it can't use win32k. He went through all of the flash source and removed everything that needed that API, replaced it with something different and then you could turn on this flag and compile it so that it couldn't use win32k. Then, when there was a vulnerability, this API wouldn't be accessible to the malicious code. This is an example of Adobe flash isolating itself from the OS, the third-party software that had a lot of vulnerabilities in it.

Another thing that happened is lack of updates. When I worked on Android, this was a common problem. There were a lot of things that had delays in updates, web view, media, Qualcomm, Linux. This is an ongoing problem and they're still working on basically reducing the windows between bugs being found and then being updated on Android.

If you think about it this way, if you make some software, an attacker might want to put in a lot of effort to find a bug and attack your users. If you have a third party library in there that isn't updated, that's free. Why would they spend on it? They just get free bugs. It's really important to make sure these libraries are up to date. Otherwise, you're making the bar very low for an attacker in lots of situations.

When I was working on the Android stuff, someone said something to me that really rang true. He said, "What people need to know is that a puppy isn't for Christmas, a puppy is forever." When you think about third-party software, this is sort of the case. When you get it, it's new and it's exciting and it solves all of your problems and you love it, but you don't always think about the lifetime of maintenance it will require.

If you think about it, when you integrate a third party component into your system, in some way you are inextricably tying the future of your product to the future of that product. Even if you remove it that's an amount of effort you're going to have to put in. There's going to be changes you're going to have to make. This isn't a decision that you can make lightly. It's a decision that you need to make really with a solid understanding of the cost of updating the software and reducing its features and all this attack surface stuff upfront.

What do you need to do with third-party software? It's very important to track it and to have a process for using it. Some of the worse situations I've dealt with have been when people don't even know they were using that component. Someone who's no longer at the company put it in and no one knows it's there. The way to prevent this is to make sure that you're tracking it and that you have a process for use, that it's not just one person making this decision, it's something you're doing as a team with an understanding of what's going to be done to maintain it.

It's important to trim unnecessary features from third-party libraries, and a lot of them support this. If you look at stuff, for example, that supports video decoding, they'll often let you turn specific codecs off, turn specific features off. It's a good idea to use these flags as much as possible and just get rid of everything you're not using, because then if there's bugs in it, those bugs won't affect you.

Also, security update frequently. You don't want to give attackers free bugs, so make sure you just squash them as soon as they're found.

Excessive SKUs and Branching

Another thing that can lead to unnecessary attack surface is excessive SKUs and branching. A SKU is a stock keeping unit. That means if you make hardware every unique model you make has a SKU. This often corresponds to a unique software release.

By branches, I mean release branches. I don't mean you branch your software and you develop for a while and then you merge it back in. That doesn't really have security risk. When you split your branch into 10 branches and maintain them all, that increases the security risk quite a bit. It means that you basically increase your risk of introducing bugs when you're patching stuff, and also it means it is now 10 times the effort to patch everything.

This is a page from a Dr. Seuss poem, "Too Many Daves". It tells the tragic tale of Mrs. McCave, who made the timesaving yet shortsighted decision to name all her sons Dave, and yet now she has problems. She has her 27 sons running around and she can't tell them apart. When she calls them, they don't come. No one knows what she's talking about. This is what will happen to you if you have too many SKUs.

I'm going to give some examples of vendors who had SKUing and branching problems. In this case, I'm going to anatomize the vendor because they told me this information in confidence.

We have vendor number one and vendor number one is a large software vendor. They make a lot of products. They make a lot of software and then a small amount of hardware. We found a bug that was in two products, a new product and an old product. When we got the fix back, it was in the new product and not the old product. I thought what had happened was that they just forgot to fix it in the old one. That's actually a really something that happens with branching.

What actually happened is they said that this was pretty much a build issue. The way their tree works is they have the new one, it goes in, it merges, it merges, it gets to the old one, it merges, it mergers. Then it gets to these tiny hardware products. This broke the build. Then they start reverting and reverting, and then apparently they cut it at that point and that's how this bug didn't get fixed.

This is just an example of how you have these really complicated build processes. This increases the risk for a lot of different reasons that stuff doesn't get fixed. Either you forget to put it in or it gets reverted accidentally or it breaks the build and to takes a lot of time to fix it. All sorts of bad things for having a really large number of branches.

Now I'm going to talk about vendor number two. I have to warn you, every time I give this talk five people come up to me and are like, "Am I vendor number two?" I just want to warn you, you are probably not vendor number two. Everyone thinks this. If you are, at least you're in good company.

Vendor number two releases 365 SKUs per year, which is one device per day. They only make devices. They don't make anything else. Once we found a vulnerability that affected all of their devices. We typically give vendors 90 days to fix a bug, and they couldn't fix it in 90 days because imagine over three or four years you have 1000 devices to fix this on. That made it very slow.

Because of all of these, they couldn't tell us a fix date and they certainly couldn't synchronize the fix date. We asked them fix saturation. How long do you think it will be until about 50% of devices are fixed? They were just, "That's inherently unknowable." Which is true if you have 1000 SKUs.

This is just a case where SKUs get out of control. It's really hard to do security updates. This company has recently started reducing the number of SKUs they have to solve this problem, although this is more challenging because they don't have support periods. They also don't really know when they're going to be done, when they can stop supporting this really large number of SKUs because they have no way to know whether anyone's using them or they don't have a date where it's ok to cut people off.

Here's another example. This is Android related. If you've ever looked at the Android tree, they have a lot of branches and they have a lot of release branches that they maintain. What happened here is, there was a vulnerability in their main kernel branch that got fixed in 2017. This didn't make it into the branch that is used to compile the pre-built kernel that gets put onto most Android devices.

This is a situation, too many branches and this fix didn't make it to eh branch it needed to be in to get to all of the devices. Now, two years later, my team discovers that people are using this vulnerability in the wild. This is a worst-case scenario of having too many branches. You find this vulnerability, you fix it, it doesn't make it into something, but attackers see it and then they're able to use that bug basically until you realize your mistake and fix it everywhere.

How do you avoid this? Avoid branching, avoid SKUs. That's easier said than done. What I'm really saying here is make sure you understand the costs of cutting a branch and maintaining it and of creating a SKU.

It's not uncommon for companies to do this early on without appreciating the cost, and then regretting it later. It's a good idea to have a document of support for everything. That means if you really mess this up, there's a timeline, even if it's really long, in which you can fix stuff. It's also a good idea to robustly test every branch and product. It's obviously better to not have that so many branches or SKUs, but in the worst-case scenario, if you can't reduce them. If you have a system set up where every time you fix a vulnerability, you can test that it's fixed on every branch. Then, if you fix a vulnerability you can also test that every branch still functions correctly, that's a big head start on making it so that it's actually possible to fix a vulnerability in every single branch and to do it quickly.

Sandboxing and Privilege Reduction

Now I'm going to talk a bit about privileged reduction in sandboxing. This isn't really attack surface reduction, but this is what you can do if you have really risky and bug-prone code that you can't get rid of, that it's really hard to reduce the attack surface of. To start off, privileged reduction is basically just reducing the privileges of a process. It's not uncommon for us to find stuff that can access stuff it doesn't need, and this is fairly simple. If you can just figure out what stuff needs and reduce it, that's protecting your users in the case, if there's ever a bug.

There's also sandboxing, and sandboxing is typically when you can't just reduce the privileges. If something is both very risky and needs access to privileged stuff, then you can split it into different pieces and then use IPC to communicate between the pieces. That can allow you to reduce the risk to the risky code while still being able to access the stuff you need to do.

It makes existing vulnerabilities less sever, although often this will require splitting functionality, and especially with sandboxing, it's not something that you can do without understanding your code. It requires you to have a clear risk analysis of all functionality in a piece of software. To sandbox effectively, you need to basically know what you have, know what's high risk and know what you want to split out.

Also, sandboxing adds an upfront cost. It means that now, on an ongoing basis, you need to start evaluating stuff. What component does it go in, what risk does it pose and make sure that you're not making your sandbox more weak every time you add something new in.

To give some examples, a few years ago there was a series of vulnerabilities in Android called Stagefright. These were image decoder issues. They were presented at Black Hat by JDoc and then many people found similar vulnerabilities.

At that time these image decoding libraries and media decoding libraries were just libraries. You would load them into your process and call them, and then they would have whatever privilege level your process had, which increased the risk. This was fixed by splitting out these libraries into their own service in Android and reducing the privileges of that, and then the library was just this stub that called into this service. This means that no matter where you called them from you would have low privileges. That protected the users somewhat in the case that there was a vulnerability in the service, though of course none of this stuff is perfect and there always still is some risk posed by vulnerabilities.

Here is a picture of the Chrome sandbox. This is just an example of sandboxing. The important part here is at the bottom you have the two renderer process. These are the high-risk code. They parse all the HTML, the JavaScript, that sort of thing, and renderer processes have vulnerabilities in them all the time. I don't even want to count, but I would say every release cycle every month there are several vulnerabilities.

The rest of the system is protected because they use IPC to talk to the other components. Let's say you access a file in a browser, you hit something, it goes through the renderer process. Then the renderer process communicates with this and says, "Grab me this file," and then it'll get the file back, and that means that the process itself doesn't need to have access to that file and that this behavior can be restricted in some ways.

I want to warn you against jumping to sandboxing, though, by using a bit of an analogy here. I think in school we were all taught about the different ways to protect the environment – reduce, reuse, recycle. The most important one is reduce. If you don't buy the thing, it definitely won't hurt the environment. If you do have to buy the thing, if you use it as much as possible before buying another thing, that protects the environmental a bit.Then your worst option is to recycle. If you buy the thing, use it one and recycle it, it's still worse for the environment than the other two.

I think this is very much true about sandboxing. You're best to just get rid of the thing and then you should try to reduce its privileges, and then you should sandbox. I think people talk about sandboxing really frequently without exploring their other options. I think it's something that's very important to do, but it's also important to consider whether you need the thing that you're sandboxing in the first place as well.

Conclusions

In conclusion, there's a lot of things you can do to reduce attack surface. You should consider the security impact of features and design and make sure you're considering the security cost of the feature before adding it in. You should track feature use and remove old and unused features. Carefully consider third party code and keep it up to date, reduce SKUs and branches, have a support period for every product. Sandbox and reduce privileges in the case that you can't just reduce serious attack surface entirely.

I'm going to end off with Smokey the Bear. Only you can reduce attack surface. I want you to think about the software you develop or the software you design or the team you manage, and think of one thing you can do to reduce attack surface. Attack surface isn't all or nothing. Every little thing you can do to reduce it will reduce your product's susceptibility to attackers. Let's think about how we can all reduce attack surface and protect our users.

Questions and Answers

Moderator: I have a first question. If you were to prioritize the things, what would you do first and what would you do last?

Silvanovich: Yes, this is a tough one, and its also because these things have different costs. I think for the stuff I've looked at, the big wins are often removing the unused features. Sometimes that can be very easy, low cost. Also, the third-party code is a huge one, and that's something I often see big wins in. Things like the reducing the SKUs, that's hugely valuable, but it's also hugely expensive and time-consuming often. I'd say maybe that one's not as important, but it really depends on what you're looking at.

Often when people start looking at attack surface, there's a few things where they can tell right away as easy, big win, and then there's things that are more expensive that they have to consider more closely.

Moderator: So usually, on security, it depends on the threat model.

Participant 1: Thanks for the great talk. I have a question. You say that we should carefully consider adding or importing dependencies on third party libraries. If we don't, if we really need that feature, there is library that implements that and we need to choose, do I bring this guy, this thing, into my system or should I implement myself and risk it having a bug, which will cause a security failure?

Silvanovich: That's a really good point and I actually forgot to say that. I'm not against third-party libraries. I think there are definitely situations where the risk of using the third party is less than the risk of running your own.

Some examples of this would be things like crypto image decoders, things where there's really good mature solutions and your chance of writing a good one isn't as high as what's out there. I think where there tends to be more problems is the stuff that's less well used or people who include it for a tiny thing from which the code is outsized.

,p> I think the way to balance this is if there's a really good mature solution that does security updates and you're committed to updating it, I think that's usually better than writing your own software. It's where you start getting into things with immature software, no security updates, not very many users, I think that's when you should either spend a lot of time evaluating that solution or seriously consider writing your own.

Participant 2: You mentioned the point where you do not want to reuse the libraries in multiple places, but nowadays, having the [inaudible 00:45:06] it allows you to have it decentralized. You can download those plugins anywhere from the internet as well by just having [inaudible 00:45:17] plugins.

It solves a lot of purpose for the developer because you get, on the fly, a lot of libraries. It makes it easier for the development. How do you balance between these two things, where you were suggesting not to use the libraries in multiple places?

Silvanovich: For third-party libraries, I think it's always just a matter of, are they suitable for your purpose? Are they mature enough? Are they better than what you can write? For that case, if you end up using it in two places, that's not a big deal if you've evaluated it and you're tracking new features and that sort of thing. I think the problem with the code-sharing happens more often with internal libraries.

Participant 2: I'm talking about internal, mainly.

Silvanovich: Yes. Your internal stuff. I think that you have to decide either do you split or do you go to the work of tracking, that it's always going to be useful or all the purposes you're using it on. There's no right answer there, but it's basically important to do one or the other. Most of the problems occur when people just share and then don't think about, we have all of these competing needs based on everyone who's using it, and we haven't given any thought to how we reconcile them.

Participant 2: Adding to that, also, I feel there is a bureaucracy. We have a software division completely focused on creating libraries. That's their main job. They want you to push all these [inaudible 00:46:51] used in all the business areas and so their core or their job streamline. How do [inaudible 00:47:02]

Silvanovich: I think there's an opportunity there. If they really want you to use their libraries, then maybe they really want to make sure that their libraries are suited to the needs of everything they're being used for. Maybe that's an opportunity to put things like flags in or other ways to basically make sure that minimum features are used in security-sensitive context.

See more presentations with transcripts

Recorded at:

Feb 20, 2020

Natalie Silvanovich

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?