BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations eBPF - Superpowers for Cloud Native Operations

eBPF - Superpowers for Cloud Native Operations

Bookmarks
40:42

Summary

Liz Rice discusses how eBPF enables high-performance tools that will help connect, manage and secure applications in the cloud.

Bio

Liz Rice is Chief Open Source Officer with cloud-native networking and security specialists Isovalent, creators of the Cilium eBPF-based networking project. She is chair of the CNCF's Technical Oversight Committee, and was Co-Chair of KubeCon + CloudNativeCon in 2018. She is also the author of Container Security, published by O'Reilly.

About the conference

InfoQ Live is a virtual event designed for you, the modern software practitioner. Take part in facilitated sessions with world-class practitioners. Hear from software leaders at our optional InfoQ Roundtables.

Transcript

Rice: My name is Liz Rice. I am Chief Open Source Officer at Isovalent, which is the company behind the Cilium networking project. I'm also the chair of the technical oversight committee at the Cloud Native Computing Foundation. I want to talk to you about eBPF, which is a technology that I've been excited about for a while now. I want to share with you why it's really revolutionizing the way that tools are being built for networking and observability and security, particularly in the world of cloud native.

What is eBPF?

We should probably start by finding out what eBPF is. Traditionally, we'd spell out the acronym, in this case it stands for Extended Berkeley Packet Filter. It's not terribly helpful as a name, what you really need to know is that eBPF lets you write custom code to run in the kernel. You no longer need to write new kernel modules, you can dynamically load and unload these eBPF programs as you need them. You probably know that application code that we write normally runs in userspace. Userspace applications can do very little on their own. They use a system call interface to ask the kernel to do things on its behalf. The kernel is actually involved whenever your applications do pretty much anything interesting. Every time that you write to the screen or open a file or send a network packet, every communication is going to involve the kernel.

Run Custom Code in the Kernel

With eBPF programs, we have to write two parts of our eBPF application. We write the eBPF program itself that we will write in C, and we write a userspace application. That is going to use the system call interface to load the eBPF program and to attach it to some event that will trigger our eBPF program to run. There are lots of different types of events that we can attach our eBPF programs to. The original one was the arrival of a network packet. That's where the packet filtering comes from in the name. You can also attach to events like what's called a kprobe, which is the entry to a function in the kernel. A kretprobe, which is the return from a function in the kernel. They're a userspace equivalent, so uprobe and uretprobe. You can attach eBPF programs to trace points. You can pretty much insert an eBPF program anywhere you want to in the kernel, so long as you know what kernel function is involved when something happens, you can attach an eBPF program to the right point in the code.

The traditional thing is to write Hello World. Let's do that. This is my Hello World. I want to start very quickly with the makefile, because I just want to point out that we have an application that we're going to build and also an object file. There are lots of different ways of combining your eBPF object program and the userspace part of your eBPF application. The framework that I'm using in this example allows me to build the object file separately, and then load it, and run it through the userspace application. I'm actually building two targets, the executable itself and then the eBPF code as an object file. That eBPF code is very simple. I said it was Hello World, but let's just modify this. I'm going to have this write a trace message. This is a convenience function that will trace out the message and a little bit of useful information about the context it was traced from. I'm going to have this triggered by the system call, execve. This system call gets triggered or gets called every time you run a new program. Any new program that runs on my machine will hit that point, that kprobe, and that will trigger my program to run. This is the part that will run in the kernel. Here's some Go code that is basically loading the object file, loading it into the kernel. Getting that Hello program from that object file, and attaching it to the system call. Then it's also going to be responsible for actually getting the trace output and printing it to the screen.

If I make that, it will do a C build and a Go build. The other order does a Go build first and then the C program gets built. If I try and run it as an unprivileged user, I don't have permissions. There's a capability called CAP_BPF, it's a privilege that you might have as a root user, or you could give it to individual users, but you wouldn't have it as a regular unprivileged user. If I run as root, we can start seeing some trace messages. There's lots going on on this machine, lots of node and shell executions happening. I'll run ps here in this bash shell. If I stopped this from scrolling, you can see, there is my bash-22005, which corresponds to the process ID of running ps. The reason I wanted to point this out is my eBPF program, my very simple Hello World can see all these different processes making calls to execve. It doesn't matter who made those calls, where they're coming from. If they're happening on this virtual machine, they're happening in this kernel, then my eBPF program gets triggered. That's one of the really powerful things about eBPF. It's hooked into the kernel, and there's only one kernel on the virtual machine. That's a very simple Hello World. If you want to dive a little bit deeper into that, I do have some slightly more advanced examples on GitHub, github.com/lizrice/ebpf-beginners.

Programmable Kernel in Kubernetes Land

Let's take a little stroll into the world of Kubernetes, and think about what it means to be able to attach eBPF programs to the kernel when we're running application code in containers, in pods. Our containerized application code is primarily running in userspace. Like any other application code, if it wants to do something useful, it's going to have to ask the kernel to do that on its behalf. In Kubernetes, those containers are contained within abstractions called pods, but they all still share the one kernel. There is only one kernel per host machine, which might be a physical machine or a virtual machine. There's only one kernel per Kubernetes node. If we have some applications running in pods, every time those applications try to do anything interesting, like reading or writing to a file, or sending or receiving network traffic, or whenever a new container gets created, the kernel is involved. The kernel is aware of everything that's happening in all the applications that are running on that machine. That means that if we have eBPF programs hooked into the appropriate points in the kernel, they can be aware of all of those interesting things that are happening across all of the pods, across all of the applications that are running in userspace.

Kubernetes-Aware Network Flows

Let's have a think about networking in particular, and Kubernetes awareness. When we have pods running in Kubernetes, every pod gets a unique IP address. Those pods are pretty ephemeral. They can come and go quite dynamically. It's very common to scale pods up and down in response to demand. Every time you create a pod, you're assigning it an IP address. That means that the IP address, it's ephemeral. Traditionally, we might look at network traffic flows and see them flowing from one IP address to another, and that would be useful. That's much more difficult to cope with in the world of Kubernetes, where these pod addresses are changing all the time.

I wanted to show very quickly how Cilium keeps track of endpoints. Every time a pod is created, Kubernetes is going to ask the networking plugin for an IP address. Actually, let's do all the pods in my Kubernetes cluster here. I've got a number of different application pods running. I'm also using Cilium for the networking. There are two nodes, and I've got one Cilium agent running on each node. Those Cilium agents are responsible for getting those IP addresses for each pod as it's created on the node. If I exec into one of those pods, hqk94, I can ask for a list of the endpoints that this agent is aware of. You can see label information and the endpoint information. For each endpoint, we're aware of the Kubernetes labels that are associated with it. We're aware of the namespaces, for example, we know what service account it's running under. If I just zoom out a bit, we can see that there's a whole list of IP addresses. For each endpoint that this agent is aware of, it can associate the IP address with the pod and the application that's running in that pod. Pulling that together allows us to draw out network information that is aware of those applications. This is just a screenshot of the Cilium UI that shows traffic flowing between different services. It knows which pod is involved for each message, because it has that mapping between IP address and pods.

That is a very powerful way of looking at networking traffic in Kubernetes. It's possible because of the fact that eBPF applications, including Cilium, can be aware of everything that's happening across the entire node. One really important thing to note about this is that the eBPF code can have visibility of all of the apps running in other pods without having to change those apps. There's no instrumentation involved. We don't need to change the configuration of those apps for eBPF code to be aware of them.

Sidecar Model

Nathan LeClaire did this really nice prototype bumper sticker on Twitter recently, which I think is a very nice encapsulation of the power of using eBPF for observability and security tooling, which until now has often been implemented as a sidecar model. In the sidecar model, the observability or security tool is injected into each pod as a sidecar container. It's just another container. Because all the containers in the pod can share things like network namespaces and volume information, because they're sharing some of these namespaces, the sidecar can have visibility into what's happening in the other containers in the pod. In order to get that sidecar into the pod, it has to be configured in YAML. This might happen manually. It probably doesn't. You probably have some automated process that injects the sidecar definition into the application YAML. It might do this pre-deployment, or it might even be done dynamically through an admission control webhook, or something like that. However it's done, the sidecar model requires you to inject that sidecar container into every single pod. If something goes wrong, and if the sidecar doesn't get correctly configured, it can't get visibility into the pod. This is somewhat fragile. If somehow, a malicious user manages to run a pod, and avoid the injection of the sidecar model, then there is no visibility of what's happening in that pod. One of the real beauties of eBPF is that we don't need to make any changes. We can have eBPF programs running in the kernel, with awareness of all of the containers, everything that's happening inside those pods, without the need to modify any YAML anywhere. We don't need to add any instrumentation into the application code either. This is one of the reasons why there are lots of new projects and lots of excitement around using eBPF for observability and security tools, as well as for networking, where it has its historical roots.

eBPF in Cloud Native

Just a quick view of some of these eBPF projects in the world of cloud native. I've already talked a bit about Cilium, which is eBPF based networking. We can actually use eBPF to make some of the parts in networking much more efficient. Falco is a project. It's incubating in the CNCF. It's a security tool. It watches for security events, and alerts you when those events happen. Tracee is another project similar to Falco, but perhaps a bit easier to install and configure. It's a bit lighter weight than Falco. It's also a less mature project. Then, finally, Pixie, which is the new kid on the block, recently acquired by New Relic, and they've offered to contribute it to the CNCF sandbox. Cilium also has an application for incubation status open right now. All of these projects are taking advantage of eBPF to do really interesting and powerful things in cloud native.

Process Visibility

I want to show you something that we've been experimenting with in Cilium, where we combine two of the concepts that I showed you earlier. The Hello World demo, you could see the process ID and you could see the name of the calling process. eBPF has access to information about processes and the programs that are running in those processes. We also have information about network flows in Cilium. If we combine those two types of information together, we get something like this, where we can see exactly which process, running what executable, in what pod, in what namespace, in what node was responsible for any particular network connection.

In this case, there were network connections to Twitter and Elasticsearch, which looks totally reasonable. Imagine that there was also a connection to a cryptocurrency miner, or a known command and control center for some malicious application. With this information, you'd be able to track back exactly which executable was responsible for opening that connection and what pod it had done that from. That would give you all the forensics that you might need to trace back where that vulnerability had occurred or how that attack had occurred.

Summary

I hope that's given you some insight into the powerful things that are possible with eBPF. eBPF isn't magic fairy dust. It does require you to write code. Some of that code is going to require a lot of knowledge of kernel data structures in order to access information effectively. It's not an easy task to write eBPF code, but it does offer hugely powerful benefits. So far, I've talked about making the Linux kernel programmable. It doesn't just stop at Linux, Microsoft recently announced eBPF on Windows. Taking the same concepts and the same general abstractions, won't be an identical. You won't be able to take an eBPF application that runs on Linux and just run it directly on Windows, because the data structures that they look at in the kernel will not be identical. As far as possible, the concepts will be kept similar. I think this is a really exciting development of the eBPF ecosystem, not just for Linux.

Resources

I hope that has given you some insight into why I'm so excited about eBPF. If you want to learn more about it, there are some great resources on the eBPF website. Of course, if you want to learn more about Cilium networking, there's a great website for that as well. There is also an extremely helpful Slack channel around Cilium and eBPF.

Questions and Answers

Ruckle: I know I've been in the cloud native space for a while. The Linux kernel isn't always something that leaps to mind as a place for innovation. As you described, there's a lot of very clever engineers trying to find ways to make the kernel more programmable and opened up just a whole universe of possibilities. The way you describe there, towards the end, about the open source projects that are using this and the maturity of things, it sounds like this technology may be making its way to a Kubernetes production cluster sooner than maybe folks think. Where are we with eBPF? I know that Netflix had a blog post about how they're using it very recently. What's your take about where we are? Is this an innovators thing? Has this moved into the early adopters phase? Just give us a sense of where you think this technology fits in the maturity spectrum?

Rice: It's a really interesting and exciting time for eBPF because of the maturity of the Linux kernel support. When something first goes into the Linux kernel, that's all great, except nobody's using it. In reality, people tend to be using kernels that have been released for a few years. It takes a while for the kernel to make its way through all the different Linux distributions. If you're running a Red Hat Linux version, or a long-term support Ubuntu distribution, it's probably using a kernel from a few months, possibly even a year or two. These days, eBPF support has been in the kernel for long enough that it is pretty well established. Most people are running with kernels in production that have this capability to run eBPF. That means we've suddenly gone from it being an interesting experiment for those who are on a very new distribution to, actually, the underlying framework. The platform is there practically in everyone's production environments. This makes it a really good time for eBPF based tools.

Some of those tools have been around for a while. You mentioned Netflix. Brendan Gregg from Netflix is really one of the innovators in the world of eBPF. He's been talking about and showing and using eBPF, particularly for performance measurement and tuning, for years. Netflix have been doing this in production for a long time. Facebook, also very involved in eBPF innovation, and have been using and publicly talking about some of the things they've done with it for quite some time. If we think about that adoption curve, that crossing the chasm curve, in the CNCF, we quite like to map different project maturity stages to that curve. Maybe sandboxes for innovators, and incubation is for early adopters, and graduation is for early majority. We're starting to see eBPF projects that are in that incubation phase. Really, you're not completely on the bleeding edge, if you're using some of these tools these days. Hope that gives some indication of the maturity.

Ruckle: I think you make a great point calling out the big hyperscalers, like Netflix, and Facebook, all these things in cloud, tend to start with companies that fit that type of profile. If those firms are using it confidently in production at the scale that they operate, that maybe means that there's maturity and tooling coming into play that lets other kinds of engineering organizations to take advantage of this.

Then, a little bit related to the future, and you'll get your get your crystal ball out. Where do you see eBPF in five years? What are your goals and expectations for this technology? I know you've compared eBPF to Docker back a few months ago, thinking about how it could really transform things. Give us your sense of how that plays out over the next five years, if you could.

Rice: The least optimistic viewpoint would be, I think we're going to see very widespread adoption of eBPF based tools, as I've tried to convey, because they don't require you to modify your application in any way. You don't even have to change the way your app is configured, let alone add any instrumentation. I think that makes it really powerful for all these observability tools that we need more in a microservices based environment, and that we need at scale. It seems very likely to me that we're going to see those tools maturing and being very widely adopted. We'll eventually see eBPF being used at scale for a lot of network based functionality. For example, both Facebook, and Cloudflare, I believe, are people who've published about how they use eBPF to help with things like denial of service attacks. If you can use eBPF, you can hook in really early to the point where a packet physically appears in your machine, potentially; actually, while it's still in the network card. If you can look at that packet and say, "I don't like the look of this. This is malicious," and just drop it straight away. That's a very powerful protection mechanism for certain types of attack.

The other end of the scale of, what could we be seeing in five years, is, could we see lots of kernel functionality rewritten in eBPF? Could we see some features of the kernel being replaced with more efficient or more customizable eBPF implementations? This is something that we start to see, for example, in network data paths, where you don't necessarily have to go through the full IP stack and all of the IP tables in the kernel, if you have an eBPF program that knows exactly where to send a network packet. I don't think we're going to replace the whole Linux kernel with eBPF, but we might see quite a lot of areas of the kernel having these alternative implementations. That would be really interesting to see where that goes.

Third part of the answer would be, I am excited about what's happening with eBPF for Windows. What happens if we have eBPF functionality that we can use across operating systems? That could be really powerful and really interesting.

Ruckle: I think that the answer to that question is always it depends. I appreciate you bucketing a couple of different scenarios there where there's a pessimistic, optimistic and then a mainstream scenario. There are so many benefits in terms of cost optimization, resiliency, and just some other benefits that you get from this. It seems like that there's some inevitability to this playing some type of role in all of these operations over the next few years. Can several eBPF programs listening to the same events conflict with each other?

Rice: In some cases, no, because depending on the type of program that you're running, you may only be able to observe information, but you can't necessarily change the state. For example, if you're hooking into a system call, you can see the parameters to that system call, but you can't change them. In that sense, it wouldn't really matter which of your different programs came first. They shouldn't be able to affect each other. If you're dealing with network packets, and you can potentially drop them on the floor, that obviously could have an impact. You definitely would need to be able to prioritize them. I don't actually know how you determine the order.

Ruckle: Can we say that eBPF is an evolution of service mesh?

Rice: I wouldn't say it is on its own an evolution of service mesh. I do think there are some interesting things, pieces of service mesh functionality that lend themselves to implementation in eBPF. One example of that would be encryption. One of the things you can do with the service mesh is you can ensure that the traffic in between different services is encrypted, set up mTLS between those two services. You can get the same effect by using network layer encryption. Cilium supports this, I believe other network layers do as well. If you have that service awareness, and you know that both ends can encrypt and decrypt, so you can use things like IPsec, or WireGuard to encrypt at the network layer, although it's not exactly the same functionality, it's achieving the same purpose of encrypting that traffic. There are some other network functions that are related to service mesh that you certainly could implement in eBPF. I would say, think of eBPF more as an environment in which you can build tools, than being the tool itself. I wouldn't say it is the service mesh, but it enables you to build. It's where you could build service mesh functionality.

Ruckle: I think that would be one of the cool things to see how the community takes this, and how some of the other service mesh communities start to play around with eBPF and how those may converge, or have some really interesting application of eBPF over time.

Rice: Definitely.

Ruckle: Apart from networking, security, and observability, where else do we see eBPF being used?

Rice: What else is there?

Ruckle: Development. Developers are going to mess around with this.

Rice: I don't think we're going to see userspace applications moving into kernel land. I could be mistaken with that. I would tend to think that we're more likely to see existing functionality that happens in the kernel being implemented in eBPF. I'm thinking there must be some other interesting aspects that aren't really part of that networking, security, observability bucket. Networking, security, and observability covers quite a lot of things.

Ruckle: There's a big surface area that the kernel touches, and you think about maybe programmable. Those are all just huge swaths of IT and things that could potentially be improved or disrupted by the technology. Let's watch this space and see what emerges from the community.

Do you think that an abstraction layer for writing eBPF code in C would be helpful for new coders, kind of democratizing eBPF and making some of these ideas more accessible? Is there any thought or momentum around that?

Rice: Today, we write the eBPF program itself in a restricted version. It is C code, there are just some things you're not allowed to do. Like, you have to check that a pointer is not null before you dereference it, because you don't want to crash the kernel. Today, we write that code in C. I've shown a Hello World here, because I think it's interesting and because I think it goes to explain what's happening. I don't really think that we're going to have a world of application developers suddenly writing a lot of eBPF code, in much of the same way that I don't think we're going to see people make contributions to the kernel. Most of the time, we rely on the kernel maintainers to do that on our behalf. I think this time it's probably going to be true of eBPF. I think what we might see is some libraries of eBPF functionality that people can use as building blocks.

We do have an abstraction already, which is bpftrace, which is a higher level tool that allows you to express where you want to hook an eBPF program, and some basic functionality like counting events. That is an abstraction of a sort. Another thing that happened recently is that the Rust compiler, it's had a PR merged into it to support BPF as a target. You can write your eBPF code in Rust and compile it and have that loaded into the kernel. That will be a fun thing to try.

Ruckle: Is there a call to action to all the folks to take? What should they be doing with eBPF? Talking to their trusted vendors about it? Contributing to the community? What would you suggest people do as a result of the information you've shared out?

Rice: I think from an operator's point of view, what we're seeing with eBPF is a new platform technology, a new way of doing things that could be very useful for most people who are operating Kubernetes clusters. I 100% encourage people who are interested to get involved and try it out, look into eBPF beginners. For most of us, we don't really need to get into the details. Maybe a new observability tool is written in Rust, or a new security tool is written in Rust, and we might think, "That's a really cool language. I am interested in that tool because it's written in Rust," because maybe we're aware that Rust has safety advantages. I'm just using Rust as an example. I think in the same way we might be like, eBPF as a platform has some advantages. When we're thinking about what tools we need in our environment, don't discount anything because it's not written in eBPF. In much the same way as you wouldn't say every tool I use must be written in Rust. You might say, I want to explore some of these eBPF based tools because they offer some advantages. Let's explore that.

 

See more presentations with transcripts

 

Recorded at:

Oct 28, 2021

BT