InfoQ Homepage Presentations Armor CLAD Functions

Armor CLAD Functions

View Presentation

Speed:

20:32

Summary

Guy Podjarny talks about how to properly secure our cloud functions. He uses a model called CLAD to remember what's left to protect, and discusses concrete practices to scale our defences.

Bio

Guy Podjarny is Snyk’s co-founder and President, driving dev-first open source & container security solutions. He was previously CTO at Akamai, co-founded Blaze.io (acq by Akamai), and was acquired by Watchfire and then IBM when building the first AppSec products. He is a frequent speaker, an O’Reilly author, and an early stage angel investor and advisor.

About the conference

InfoQ Live is a virtual event designed for you, the modern software practitioner. Take part in facilitated sessions with world-class practitioners. Connect, see, and speak with like-minded people. Join us to accelerate your learning, be better informed, and drive innovation.

Transcript

Podjarny: I'm Guy Podjarny. I'm here to talk to you about serverless security. I'm the Co-founder and President of Snyk, where we focus on developer-first security. Prior to that, I was CTO at Akamai, after they acquired my previous startup, and very much was a part of the rising world of DevOps. I also host a podcast called, "The Secure Developer." I've been digging into serverless security and open source security for quite a few years, including writing a couple of books about it. You can get a copy of that "Serverless Security" book for free with the link at the bottom right that I co-authored with Liran Tal.

Serverless or Functions?

Let's start with just one slide of taxonomy. We say serverless means different things to different people. I'm going to use the definition of functions of serverless. Serverless can be many things. I think the most prevalent manifestation of serverless is functions like Lambda functions, Google Cloud functions, Azure functions. Basically, think about functions or the serverless in the context of these functions that run on top of a cloud platform that manages the VM for you and the operating system, and you just deploy the functions to do it. For some people, this may seem obvious, but just to make sure we're level set, this is the definition of serverless, or functions that I'll be using for the context of this presentation.

Serverless Implicitly Helps Security

We're going to dig a lot into the gaps. This talk really meant to be more of a practical talk of what can you take away and do things with later on. I'm not going to harp too much about the advantages. Let me just start with one slide about the security advantages that serverless brings in. Serverless does implicitly tackle a bunch of security concerns by pushing the handling of them to the underlying platform. The three notable ones are, one, is it takes away the server wrangling. It takes away the need to patch your servers and your operating system, which is one of the primary ways attackers can get in. Serverless means that the platform patches these servers, these operating systems for you. Generally, this is their core competency as part of this platform, so they do it quite well.

Second thing it does is it tackles denial of service attacks pretty well. Serverless naturally elastically scales to handle large volumes of traffic, of good traffic. In the process, it also can handle a substantial amount of bad traffic that might be trying to use up your capacity so that you can't serve good users or serve legitimate users. You can still get DoS'd. You definitely can get a pretty big bill if you're using serverless, but it's harder, so it does help that out.

Then the third is probably something that doesn't get as much credit as it should, which is long-standing compromised servers. Really, serverless means that the servers are very short-lived. These components that run your software, they come in and they go away. That implies that a very typical attack that means get into a server, compromise it, get your agent and then proceed forward, which is how most attacks get carried out, can't really be done. Attackers need to do more of an end-to-end attack in one go, which is harder and carries high risk of exposure.

Serverless Security - CLAD Model

With that, it helps all of these things. It doesn't get all the job done. There's a lot of responsibility that still lives with you. Let's dig into what those are. We're going to go through them in a model I call CLAD. It's made up of your function's code, which might contain vulnerabilities. Libraries, the components, the binaries that you pulled in through your app, as opposed to through an open source, or through the operating system, for instance npm, or Maven, or PyPi, and they're still in your components. They're still a part of your application. Over time, they might have known vulnerabilities in them that attackers can exploit. We'll talk about access or configuration, which is, where is it that you've given too much permission to a function, and therefore made it either more risky if an attacker compromises it or made it easier for an attacker to access it. About data, which is a little bit different in serverless, because you take away the transient data that might live on a server.

Code (Function)

Let's go one by one, start with code. That's the heart of what we're trying to do. Here's an example of a function. This is a Lambda function in Python. What it does is it simulates in an e-commerce store that might create an S3 file for every order made. Then this event gets called when the order is fulfilled so that the file is amended with the date to indicate that it was fulfilled. The first section here just reads the S3 bucket and the key to use. It splits out the order number from the key name. Then it goes to S3 and it downloads that file into /tmp. It downloads the file. It stores it. It's now a file name over there. It goes off, gets the date that is right now, and then uses an operating system action of echo to append the date to the file, and goes back to upload that file to the server or to S3.

The scariest piece of this code is probably that os.system. Oftentimes, there'd be dragons. Indeed, that's the case over here. Really, the security mistake, the real one, happens a little bit further up, which is over here. We're very used to S3 files or S3 objects, but they look awfully like files. We oftentimes think about them as files, but they're not files, they're objects. Per AWS documentation, they actually can hold any UTF-8 character in them. That includes, for instance, a semicolon. Over here, when I do os.system, and I replace these curly brackets over here with the download path, I'm actually potentially allowing a remote command execution. Let's say if the payload looks like this, which is, it has the S3 URL, it has the three slashes. Then, after that, instead of an order idea, there's a null semicolon, and then a command that might send some information I ran locally to a malicious server. It's a pretty bad attack. It's because of that semicolon. I'm using this as an example, first of all, to say that there's nothing in serverless that would protect you against this. This type of remote command execution vulnerability can happen in serverless, just as much as it can happen in a non-serverless, in a serverfull, whatever application.

Also, to note that what we trusted here was not HTTP input, but rather an S3 file name. That's a very common mistake in the world of serverless. I think we've learned in the world of development to be cautious with HTTP traffic. You really need to think about every function as its own perimeter. This function makes an assumption that is actually wrong in this OWASP example that I took this from, that the S3 bucket is safe. That attackers cannot create files within that S3 bucket. It shouldn't have made that assumption. It should validate that this function is secure, even if controls around it are not perfect, because serverless is made out of blocks that you can move around, and you can combine in different ways.

Securing Functions Code

For that first one, secure your code. It's your function. It's your code. There can still be vulnerabilities there. Specifically, be mindful of event inputs, not just HTTP inputs, SNS, S3 file names. Treat every function as a perimeter. To be able to scale that, you really need to use shared security libraries. You're going to have many functions. It's just not practical or realistic to think that your developers would always get that right, the development team. For every function, properly sanitize every source of input. It's easier if you create or choose an external sanitization library that they can use that, for instance, sanitizes S3 file names or the likes. That's one, that's code.

Libraries - Sprinkles of (OSS) Infrastructure in Your Functions

Second is libraries. I like to think of these as these are sprinkles of infrastructure that are in your functions. They're a part of the app. We get used to thinking about them as the app or the function. In practice, they behave very much like infrastructure, just like an operating system or a server might have an unpatched NGINX, a function might have an unpatched express, or other library. There are quite a few of them. Let me share some numbers. I looked at Snyk's projects, we protect about a million and some projects, many of them are serverless. I did a quick analysis of what's the median number of dependencies that the serverless functions in our repository have. It's substantial. It's 6 to 16 libraries that a function on average, or by median uses. Maybe more interesting is that these functions or these components, use other components that use other components. In total, the number of dependencies, the number of libraries is dramatically bigger. It's one, sometimes more orders of magnitude bigger than these direct dependencies. There are a lot of components. A lot of components that might have a vulnerability. A lot of them that can grow stale that might have had a vulnerability, but have not had a vulnerability but now a new disclosure came along and shared that it actually has a security flaw.

My third number here, that this is per each of these four ecosystems, how many zero days, or rather, new disclosures of vulnerabilities in these components took place in the last 12 months alone? That's a lot. If you do the math, even without exact calculation, and you think about many functions, many libraries, and many vulnerabilities, the likelihood of you having significant holes in your fence, significant ways in for an attacker to easily walk in through, the odds are pretty high. You really should ensure that you tackle that. That's an infrastructure-ish type risk that you still need to control.

What do you do about it? First of all, you have to know what you've got. You want to make sure that you invest in tracking which components are being used by every function. Here, I'm showing an example of using Snyk directly on the Lambda functions. You can also scan them on the Git repos, or in the pipeline, whatever works for you. You should keep note of which function, the ones in production, especially, uses which components. Then track whether new vulnerabilities get released on them. Then second, is you want to invest in remediation. You're going to get these alerts often. The reality is that these happen all the time. You want to make sure that it's easy for you, once you've found out about an issue to fix it, typically throw an upgrade, and roll that out.

Securing Functions Libraries

That's the second piece, libraries. You want to make sure that you monitor them, that you streamline remediation, and that you know what you've got on these different functions over time.

Access

Third is access. Access or permissions, there's a lot of ways to think about this. It's really about the difference between, what can your function do and what should it be able to do? What's the right minimum set? In serverless, what you oftentimes see is a pattern like this. This is a serverless YAML file. This pattern happens in every ecosystem. What you can see here in the middle is it uses a single file to define multiple functions. Makes sense. It has create, list, get. They're all deployed together. A lot of considerations about whether that's good or bad, but it's common. Then at the top of that file, you have a permissions set. You basically have the iamRole, what can they do?

Putting all of these into one file is awfully convenient, but what it actually does is it gives every one of these functions a super set of the permissions that each one of them needs. Permissions are these finicky beasts. They basically never contract. Once you give some function a permission, and it runs, it's really scary to take that permission away. You really don't know what might break. The reality is that they never contract. They just expand until somebody adds an asterisk. You really want to invest in shrinking that and having the right policies in place from the get-go. You want to remember that a single security policy might be easier, but the safe way to go is to invest in having a policy per function. Then, if you do that well, not only are you overcoming a problem, you're actually better off than you were before, because in the monolith situation, if you have a single app and it has all those functions in one, the platforms don't allow you to do it. You can't say this piece of the code has this permission, and this piece of the code has the other. Functions and serverless allow you to do that. Take advantage of it instead of making it be a flaw.

Securing Functions Access

We talked about giving functions. How do you secure access? You give functions their minimal permissions, even if it's harder. I highly recommend isolating experiments from production. Serverless makes it very easy to deploy stuff. You can deploy all sorts of experiments quickly. Try them out. If they don't work, they don't cost you anything once you've deployed. What you forget is that, again, these libraries that are sitting there, or other security mistakes, they grow stale. They can get found over time. New vulnerabilities might be disclosed on them. It's risky. Really, you want to separate out by level of care that you're going to give them. Your production functions, they should be in one spot, maybe even multiple spots. That they're getting proper maintenance. You know what's on them. You know that you're going to address security issues there. Then anything that's experimental, that's not going to get the same level of attention, it really should be away from that secure surrounding from your customer data. Then if you really want to level up to the plus one here, is actually try to build a system that tracks unused permissions and reduces them over time. Whether you do it through logs or through more of a chaos engineering style, remove a permission and see what happens. If you manage to build that competency, that would be very powerful for you to keep your functions and that application secure and as safe as it can be.

Data - Input and Output into Your Functions

Then the last in our sequence here of CLAD is data. In data, again, you have to remember that, at the end of the day, applications typically are just processing data. It's some piece of logic and it takes some data in and it takes some data out. Serverless is no different. These functions still process data, and they need to do it well. However, in serverless, there's also the concern that you lost the opportunity to store transient data. Things like session data, or log data that you might have temporarily put on the machine or even held in memory, you can't do that anymore. The result of that is that much more of that data gets stored outside the function. It might get stored into some Redis session cache. It might get stored into another spot. You have to be mindful of how do you secure that data, because just like before, when we talked about the perimeter, you don't know who has access to that data. Where would that work? Where would that go?

Here's one example. Again, I'm using Snyk for it. You can use various solutions for it. Here, we inspect the Terraform script that deploys a serverless function. What we can see here is that we've enabled logging outside, of course, of the function. There's no point storing logs on an immutable and very short-lived server that's running a function. We're storing them outside but we forgot to turn on encryption, which in this case, is a recommendation. They're not encrypted at rest. Who knows who has access to it? Again, Redis sessions, various others. You have to be mindful of what you do there.

Securing Functions Data

Data is important. Once again, one aspect of it is serverless doesn't magically make your data security concerns go away. You just need to be mindful. A bit more specifically to serverless, I would advise you keep secrets away from code. Serverless makes everything so easy. Secrets are a little bit harder, using KMS is just a tad harder. Not hard, just not quite as easy as some of the other things. It's very tempting to just check in some code or some key into your code repository. Don't. Don't do that. Very easy to steal them later. It's hard to rotate them. Try to use a KMS, at least environment variables, and proceed. Secure data in transit, which is, when you think about these functions, data moves between network entities, between functions far more than before. Are you securing it when it's going in transit? When you're going to third-party components, you're reading data back, because it's not all on the same machine, you cannot trust the channel that these functions communicate through. You can, but then, again, you're assuming. You don't treat as if every function is a perimeter. If things move around, you're quite fragile. Consider encrypting it. Consider verifying the identity of the other entity that you're talking to.

Then, last and not least, think about that transit data, that session data. This isn't more severe than the other two, it's just a little bit newer for serverless development. If you've gone from developing for non-serverless, and you might have been used to holding session data in memory, you might not have thought to encrypt it. Now when you store it off to like a side Redis, maybe you should.

Summary - CLAD Model

That's the CLAD model. It basically says serverless is amazing. It takes care of a lot of security concerns for you implicitly, but it leaves you with code, libraries, access, and data, all of which you need to secure.

Scale

Let me leave you with two more thoughts on it. One is a word about scale. Serverless, today, you might have 20 functions, 30 functions, 50 functions. It might seem manageable. It's a small amount, and you might be auditing them, or surveying their security manually, but it won't work. Serverless is all about scale. Tomorrow, you're going to have 500, then 5000 functions. If you don't invest in automation and observability, just being able to know what's going on, you're going to get in trouble. Now that you're building out your practices, make sure that you are aware of which functions are there. What's their current security status? Which components do they run? What are their permissions? You really get ahead of this, otherwise, it's going to be really hard to untangle the mess that might get created.

Serverless Security Requires DevSecOps

Then the second unrelated is even above the tech, serverless, because of the speed of development, serverless is all about speed. It's about being able to deploy these functions again and again, and have them be small units that just work with good APIs. There's no room, there's no time, there's no opportunity for an external security team to butt in. It won't fit the business needs to have a security team come in and stop the deployment process, and audit. The only way to scale it is the DevSecOps approach, where, really, you want to have empowered developers and give them the tools, give them the ownership, the mandate to secure what they're building. Then you want to have a security team whose job it is, is really to help these developers do so better, and more easily all the time, and make sure that they actually get that done. With that model, you can actually scale security beyond serverless just for cloud native development and development as a whole.

Serverless is awesome. It adds a lot of security layers, but there's a lot of responsibility that still lies with you. Take it seriously.

See more presentations with transcripts

Recorded at:

Jan 10, 2021

Guy Podjarny

InfoQ Software Architects' Newsletter