BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts How eBPF Empowers Developers to Observe Inside the Linux Kernel in a Safe and Unintrusive Way

How eBPF Empowers Developers to Observe Inside the Linux Kernel in a Safe and Unintrusive Way

Dan Fineran explores how eBPF has evolved far beyond its roots in packet filtering into a robust, safe way to extend the Linux kernel. He explains how the eBPF "verifier", the security guardrail, enables implementation of deep observability and networking without the risks of traditional kernel modules or the slow upstreaming process. He touches on tools like Tetragon that leverage eBPF for "front-foot" security enforcement, proactively intercepting threats such as buffer overflows before they execute, while providing visibility into file systems and drivers without intrusive instrumentation.

Key Takeaways

  • eBPF circumvents the lengthy Linux upstreaming process and the fragility of kernel modules. By leveraging a strict "verifier" (a security "bouncer"), it ensures eBPF programs cannot block execution, crash the kernel, or access invalid memory, providing a safe, hot-deployable extension mechanism.
  • While eBPF is famous for networking (Cilium), its true power lies in observability and enforcement across any syscall. It allows engineers to gain deep insights into file systems, storage layers, and device drivers without needing to instrument the application code itself.
  • Unlike traditional agents that monitor after an event occurs, eBPF allows for "front-foot" enforcement. By attaching pre-hooks to syscalls, you can intercept and block malicious behavior (e.g., unauthorized file deletion or buffer overflows) before the kernel executes the command, enabling live-patching for critical CVEs.
  • eBPF is evolving beyond a Linux-exclusive technology. With emerging support for Windows, organizations can begin to unify their observability and security policy architecture across heterogeneous operating system environments.
  • In the era of widespread AI-driven contributions, perform strict due diligence on open-source eBPF projects. Evaluate the robustness of the human maintainer community, long-term support capability, and code authenticity rather than focusing solely on the feature list or the volume of AI-generated submissions.

Transcript

Olimpiu Pop: Hello everybody, I am Olimpiu Pop, an InfoQ editor. And I have in front of me, Dan Fineran. I will not try to pronounce his name because I probably will butcher it, so let’s have Dan introduce himself. How the things started for our conversation, it was after QCon. Dan had a fabulous presentation, and a lot of people were just eyes on the slides, but we learned a lot because he had some code as well. But he was concerned, while chatting in the elevator, about how much of the gist of what eBPF actually has reached the audience. So then we said, let's give it a try and give it another angle, me playing the role of the noob. I had exposure to eBPF previously, but it's always good to hear more about it. But without further ado, Dan, please introduce yourself and teach me how to pronounce your last name so I can do better next time.

Dan Fineran: It's a funny surname. To a certain degree, many people do pronounce it in myriad ways, which is always fine. My name is Daniel or Dan Fineran. So it's an Irish surname. Cinnamon, Sinneren, a lot of different mispronunciations, but Dan Fineran typically. So, a quick overview: I am part of the community team at Isovalent. And it is now Isovalent at Cisco. So, around 18 months ago, we joined the Cisco family. Isovalent is the company that effectively spearheaded the creation of the eBPF project and then built projects on top of eBPF, namely the Cilium and Tetragon projects, which are typically cloud-native in the cloud-native ecosystem.

Decoding the Name: The Origins of eBPF [02:23]

Olimpiu Pop: So let's start with the simple things. Well, if any of them are simple when we are discussing the Linux kernel. eBPF has a lengthy name that includes something with Berkeley. And what I remember from the other conversations I had with the guys from Isovalent in previous years is that it's more or less a keyhole into the Linux kernel. And that says a lot, because we know the guys from Linux are, how should I put it, very protective of Linux, per se. And then the question is what eBPF actually is and how they allowed such an exception, because it probably has to be something very important if they managed to do that.

Dan Fineran: I guess we'll kick off with the first thing to think about, which is the naming. So eBPF is the technology that we're going to kind of pull apart and understand. And I guess a little bit of history is that it was originally based upon BPF, which is the Berkeley Packet Filter. BPF is effectively a way of configuring a kernel to perform packet matches and allow or filter network traffic. And the way it worked was that you would write these filters, attach them to a running system, and those filters would see traffic and do what they needed to do. And people faced challenges with the Linux kernel; they wanted to do more and more with it.

So what they did was: do we start from scratch, or do we extend this tried-and-tested packet filtering mechanism? So as I said, we can write some packet filters and attach them to a running system. And it was a case of: we've already kind of got some hooks in the kernel that have been allowed; maybe we can extend them. So eBPF did stand for extended Berkeley Packet Filter. However, the community has reached a consensus that the extension has grown so large that eBPF no longer bears any resemblance to BPF. So eBPF now doesn't actually stand for anything. The community basically uses "eBPF" as the name for everything we do in that space within the Linux kernel, and it's the technology that powers it.

Why eBPF? Solving Kernel Constraints [04:50]

Dan Fineran: Yes, the Linux kernel and applications that run on top of Linux now, it's ubiquitous, it's everywhere. Mobile phones, highly available systems, I wouldn't be surprised if space launchers and rockets and things like that aren't running Linux kernels and things like that. So there's a lot of heavy-duty infrastructure that is now stable and runs atop the Linux kernel. So the idea of suddenly being able to crack open a tried and tested platform and start twiddling things generally is quite scary to most people, and it's not really what is wanted. And we've seen this before because the Linux kernel is kind of gated, making it very hard to get code all the way into it. So if you wanted to change the behaviour of a Linux system, you would typically need to obtain the Linux source code. You would need to find where that change that you want to make is. You would need to create a patch set. That patch set would need to be reviewed by several people.

So, for instance, if it were a new driver for storage, the storage team would need to review it first. And then the next layer would need to review it. And then the next layer would need to review it. And then, finally, if you're very, very lucky, Linus or Greg, or one of the two overarching gatekeepers of the Linux kernel, will finally get a go or a no-go. And your code may finally make it, rarely, but it may finally make it into the Linux kernel. And then you need to wait for Ubuntu, Red Hat, or whoever it is to finally adopt that kernel, so you can actually get a running system with your changes in place. And one of the main reasons why eBPF came into fruition is that the process is so lengthy. And there are no guarantees that any changes you want to make will make it through all those steps. And as we mentioned, security, breaking the Linux kernel, and things like that are all there; these various layers are there to ensure bad code doesn't make its way into the Linux kernel. But there are still opportunities to do that.

Dan Fineran: So if you don't want to go through the path of upstreaming to a Linux kernel, your next option is the concept of a kernel module. So, a kernel module is a way to take the source code for the kernel you are running on your current system and write code that hooks into the side of the existing kernel. So you'll run insmod to install the module. And that code that you added will now be added into the Linux kernel and will be ran when the path of execution hits your module. Now that's one way of extending an existing kernel. However, it also comes with a bunch of issues.

So first and foremost, kernel modules typically have to be compiled against a particular kernel version because internal structures change and things like that. So if I wanted to make a change for everybody, I would need to create kernel modules for every kernel version that conceivably could exist. So you have massive technical overhead. Additionally, kernel modules can crash a kernel. So if I have bad code in my kernel module that just loops indefinitely, blocks execution, or otherwise does something that is not allowed, you can take down a running system with a kernel module.

The Verifier: How eBPF Ensures Stability [08:33]

Dan Fineran: eBPF is designed in a way to ensure that those sorts of issues that exist with things like kernel modules are no longer present. And it gives us the velocity and speed we don't have to deal with, wanting to push things upstream through all the various chains of who will approve this code, who will merge it, and who will actually add it to the mainline kernel itself. That's kind of like the problem statement of why eBPF came into play. And the crux of it really is stability. And to ensure that eBPF code can never impact a running system, there are many guardrails in place around eBPF.

And two of them really just come into actually writing eBPF code itself. So you can write some eBPF code that does something. What it does is irrelevant at this point. But when you first try to compile your eBPF code into a way that can be attached, compilers that are going to build that code come with strict rules that will effectively only allow code that looks in a certain way to actually compile into eBPF code. So compilers will run very stringent checks: are we trying to access things that are out of bounds? Are we effectively doing things that could break a system? So that's kind of the first guardrail.

The second and the most stringent guardrail, and you can think of the kernel as a nightclub. Think of your eBPF code wanting to go to that nightclub. You have what's called the verifier. And that is the bouncer, the security on the door to your nightclub. And that eBPF verifier is there to perform a large number of checks on your code to ensure you're actually allowed to run in the first place. So even if you compile your code, there's still no guarantee that the verifier will allow you to then load that into the kernel. And the verifier will do things like unroll all of the loops. So if you have a loop that will run for X amount of time, it will unroll it all. It will step through that loop. It will try to run things without variables being populated to ensure that outbound memory accesses can't occur. The verifier is filled with hundreds of tests to ensure that eBPF code can never affect a running system. It can never be blocked or stuck in a loop forever. All of those different things. And those are the first two main guardrails to ensure that eBPF can't affect a system.

Olimpiu Pop: Okay. Let me see if I got it correctly. Plainly speaking, eBPF is a sandbox inside the Linux kernel. So that will allow you to run things you actually want to run at the kernel level without needing to go through all the changes, and, actually, you don't want to go through all those points. And if I understand correctly, you can do something like a hot deploy, so you don't even need to restart the whole machine itself. That allows you to experiment and do a couple of things. And you have multiple layers of security that ensure the code you actually provided is good enough not to break the kernel itself.

Dan Fineran: 100%, yes.

Olimpiu Pop: Okay, great. But taking a step back, it's obviously a great technology. Looking at most of the personas, which one would like to have more to do with eBPF? Like most of the developers, they will not care about it. And then we have to look more into who the actual developers or other personas are who will need to care about it. Or let me put it a different way. Is it iptables and eBPF, or do we need an OR between them, or an AND?

Beyond Networking: Expanding the Use Cases [12:40]

Dan Fineran: Oh, good question, good question. It's too complicated to the point that eBPF isn't just networking. The way an eBPF program works is that you write your eBPF code, and you can attach it to a myriad of different places within a running kernel. So we may want to attach it to areas known as kernel probes. What that means is that whenever a particular function is fired within the Linux kernel, say file open, our eBPF code is executed first. We can attach to syscalls and do something fundamentally similar. In my role, we tend to think about attaching our eBPF code to the networking stack to do clever networking. There's no nice way of sugarcoating it, though; writing eBPF code is not the easiest thing to do. You typically need low-level languages. C is what most eBPF code I see is written in. However, there is a really big effort now to enable and help people write eBPF code in Rust, so, you know, a more modern, memory-safe language that compiles into an eBPF program.

And you know, to your point, really, who needs to care about eBPF? That is a very good question. I mean, I'm part of the eBPF community, so from my perspective, everybody. But really, that's not the case at all. People who just want to do networking with Cilium inside a Kubernetes cluster. Kubernetes is complicated enough. With Cilium, we kind of abstract away everything so we can give people the principles to say, you know, this workload can speak to this workload. People don't really need to know what happens under the covers, you know, kind of the amazing engineers who have built eBPF and have built Cilium, they've taken care of the implementation details for you, you know, like you don't need to care about IP tables, you don't need to care about the eBPF side of things. We present the user with the nice UX side of things. But we are seeing more and more people writing eBPF code now, and we are seeing many more use cases.

Observability is a really big one at the moment, so you know, writing some eBPF code that is going to hook into the storage layer. That is going to hook into device drivers. We have seen a pretty cool use case where people have written eBPF code which hooks into the Nvidia CUDA drivers for the GPUs and will basically be able to get low-level information about how busy the GPUs are, correlate that to how much workload is coming in, to ensure that GPUs never go quiet, and we don't end up wasting electricity and things like that as well.

So the drivers were never designed to do that. We have basically been able to hook into the eBPF code, see the drivers' memory structures, and understand what's kind of happening. The personas are changing. We saw a lot of people writing clever networking with eBPF, you know, redirecting transparently, removing the need for IP tables. Now we're seeing people writing eBPF code to hook into userland applications to observe everything happening within the running system, to look at encryption, the behaviours of a running system, and things like that as well.

Olimpiu Pop: Okay, so the summary of what you said is it depends, right? So the proper answer always.

Dan Fineran: Very political.

Olimpiu Pop: Well, that's the answer that we tend to give for everything. That means that eBPF is a very powerful technology that provides access to the Linux kernel. And everything happens to the Linux kernel. Probably file access, network access, pretty much everything that you have running on a Linux server, and probably you have to take a step back and be more inclusive these days. And actually it's not only Linux, not long ago Windows opened the doors as well. So any operating system in the sphere of Linux or Windows, right?

Cross-Platform: eBPF on Linux and Windows [16:42]

Dan Fineran: That's correct, yes. So in a very rare series of events, Microsoft have effectively allowed the capability of extending the Windows kernel with at the moment it's eBPF drivers. So effectively this is something you need to sideload into Windows. With Linux it's there out of the box. eBPF has been in the Linux kernel since 4.something or other, like super stable since 5. With Windows, eBPF for Windows you currently need to load in some additional drivers. The idea being at some point in the future, eBPF will just be part of Windows out of the box. One of the cool things really is that the eBPF code that you compile will basically just run anywhere because it is its own bytecode and you can effectively then run that in either a Windows sandbox or a Linux sandbox, attach it to the various areas of either/or kernel, and then you know, the eBPF code will do what it's programmed to do.

Olimpiu Pop: Okay, so at this point of time, probably it's fair to say is that if we want to have a very fine-grained access to information at the kernel level of an operating system, and probably here we can take a lot of stuff as you mentioned, it's about access to drivers, it's about access to storage, and so on so forth, or pretty much everything that is at the kernel level. And everything is actually everything because everything runs there.

In Linux or Windows operating systems. And probably if you deploy applications, those will be the only ones that you care. I don't particularly hear about many people deploying on Windows, but if you are part of that very small percentage around the world, then yes, you have the eBPF now. And probably one of the applications that are running, but that's more focused on security, that's Tetragon. Again another project that is coming from Isovalent, now part of Cisco obviously. And if I remember correctly, Tetragon was very focused on ensuring your operating system. Maybe we can look through that lens and say, okay, in this particular cases, Tetragon will use eBPF to secure your machine because I think the rule of two things, one of them was observability and the other one was also action at the kernel level.

Tetragon: Real-Time Security and Observability [19:08]

Dan Fineran: 100%. So Tetragon is a second product that was developed inside of Isovalent. And where Cilium effectively hooks into the network and will move traffic around, Tetragon as you mentioned, is there really to do two things. One, it will hook into any area of the system that you want it to do. So you can effectively hook into any syscall that the kernel is going to do, and you can simply have it in a reporting way where every action that is happening within your system is audited. So you'll get full audit logs. Every time somebody tries to open a file, we'll be able to see what that file was, who tried to open it, things like that. We can do things like again audit for whomever is elevating privileges. So every time I try to become root, that is audited.

The kernel will do that, but Tetragon will see that actually occurring and we'll report all of that back. So there is nothing that we pretty much can't access inside the kernel. Every syscall we can attach to. Every time that syscall is triggered, Tetragon will be made aware that that has actually been triggered, who triggered it, what all of the parameters of that all kind of look like. But auditing is incredibly useful and incredibly powerful. To me the main power lies in the enforcement side of things. And again this is one of the reasons why eBPF is more powerful than some of the previous tools that existed. In that many years ago I had to install an agent across an entire fleet for things like PCI DSS, one of the security requirements. And this agent, I can't remember the name of it, but it centrally managed and it was there to do some level of enforcement.

The problem with the way that the agent worked was it was kind of always on the back foot in that it couldn't hook into the kernel. The kernel would do something and then the agent would see it happening and then try and stop it or revert the behavior and things like that. eBPF hooks into a running system so that when the kernel tries to do something, eBPF will be ran first before the kernel will actually run the thing that it was meant to run. So it has hooks inside the Linux kernel, there is a pre-hook, a post-hook. Typically the order of execution is say for instance if I wanted to open a file and I wanted to open /etc/password to get everybody's details. Typically what would happen is the syscall file open would be executed and the file would then be opened and mapped to where it needs to be. With eBPF, a pre-hook where our eBPF code would be attached means that when we try to do the file open, our eBPF code is executed first. We can look at what is actually going to be sent to the actual syscall. We can make a decision based upon it there.

Then the Linux kernel will execute its actual code for file open. And then after that we can hook into a post-hook to see the results of everything that has actually happened. So we can go before and after, and we can observe and enforce across the entire spectrum. So the behaviors and things that we can change within the Linux kernel typically means that when the Linux kernel is actually going to execute its file open, we've actually changed things. And the Linux kernel is none the wiser. So we're not on the back foot. We're on the front foot. We are in front of the kernel before it actually tries to do things. Which really gives us a lot of power in protecting running systems. AI agents going quite crazy at the moment, there's a lot of people, you know, kind of can you tidy this up for me. When you say tidy up I meant tidy up the codebase, not delete my entire file system and tidy that up for me.

So you know, we can effectively have Tetragon policies which say if we ever see a syscall for file delete and the path of it matches /home/dan/super-important-project, simply do not allow the syscall then to actually execute, or just don't allow it to happen. And then the syscall will effectively either fail or just not execute. Again the deleting file systems and things like that, we've seen a lot of AI agents do, but a lot of other things really are enforcing who can do what within a running system. You know, if you have a database, Oracle for instance running or MySQL. Ultimately there'll be a MySQL DB file on your file system. Now the only process that should ever access that is the mysqld, the daemon process for MySQL.

So really, only MySQL should ever be trying to access all of that. So we can kind of correlate between who is allowed to do what, anybody else is simply not allowed to do that sort of thing. Again, who's allowed to go to superuser. And what we've been able to do with Tetragon really is not just, you know, kind of do simple policies about what can and can't be opened and read and who can execute to superusers and things like that, is building policies which mitigate against CVEs. So some libraries for instance have buffer overflows and things like that. If you were to send 40 bytes where 30 bytes are expected and the last 10 bytes are some code that will be executed to give you this, that and the other.

Well we can hook into the pre-hooks before that library is actually accessed. We can see what data is going to be sent to that library. And we can see that it's going to be a buffer overflow. There's 40 bytes coming in here and we know that due to the CVE and how that looks, that's too many bytes. Somebody's trying to do a buffer overflow here. And we can stop that from happening. So not only are we protecting systems, we are live protecting against vulnerabilities that are happening on a daily basis. Which I think is very powerful.

Olimpiu Pop: I had a conversation probably last year after QCon with Marina Moore from Edera. And one of the drums that they are beating is about the kernel level problems, one of them being buffer overflows that any kind of CVE that affects the Linux kernel might affect the whole orchestration of containers that are running there. So that's why they are one of the voices that are preaching for secure boot and then on the other hand they are preaching against containers in some situations and actually insisting on using micro virtual machines. Given that being very small, they will give the best of two worlds, not the heavyweight of what you used to have as virtual machines previously, but also the ability of having fast boot time and so on so forth.

Dan Fineran: Yes.

Olimpiu Pop: Is the combination of eBPF and Tetragon another way of looking at that issue? Meaning we have a Kubernetes cluster, we don't want to go to full virtual machines, I don't know why, but we do have Tetragon and eBPF, so Cilium together with Tetragon. Will that help us?

Micro VMs vs. eBPF: Architecting Secure Clusters [26:23]

Dan Fineran: Yes, absolutely. I would like to point out that Edera recently have updated their kernel, their micro VM kernel, so that eBPF is now supported inside their running micro VMs. And that I think is kind of testament because it's okay putting your containers inside micro VMs, but if you put web server in there and there is still the capability of exploiting the application that's running in there, you're still going to get inside the Kubernetes network and things like that, you are still going to be able to do things that exist within that micro VM even if you can just infect that micro VM, that still means you're now inside the running environment and things like that as well. So for me I think separation of kernel is a good thing, because you know, if you break one kernel, you can effectively break everything that sits underneath it.

However, I think a combination of the both, a isolated kernel or a virtualized kernel, and having Tetragon there to ensure that we can observe and we can still live patch if need be or ensure you know, kind of what is actually happening within a running system, because as mentioned, even if that VM is still vulnerable in one way or another, we can mitigate bad behaviors through eBPF simply and easily with Tetragon or other eBPF related means to ensure those found CVEs and things like that. For me, best of both worlds is one or both.

Olimpiu Pop: Okay, thank you. The other thing that I was thinking of is how close are we to self-healing systems? Because back in the day when I was doing my master thesis, that was the rage at that point. But it was still a very far away dream. Now we are discussing about agents that are allowing us to write code on the spot based on feedback and human readable, and you have a way to look into the Linux kernel. Are we close or we are still far away from it?

Toward Self-Healing Infrastructure [28:54]

Dan Fineran: So we have been experimenting, we experimented just in the community side of things with trying to get some of the AI providers to generate policies based upon some behaviors that we kind of asked the AI provider to create for us. It would generate Tetragon policies for us. Or what looked like Tetragon policies. When we applied the Tetragon policies, it's like some of these fields don't make any sense whatsoever. It looks and smells like a Tetragon policy, but like the model and the spec of the policy is wrong. However, I think as the models improve and you know, we potentially put more resources together so that these models can understand how and what is required to generate a working policy that Tetragon understands, we could well be in a position where a CVE is found, it could be well that things like Mythos, the new AI exploit tool from Anthropic.

Olimpiu Pop: Let's say tool from Anthropic.

Dan Fineran: Yes, yes. So that finds a CVE in the Linux kernel and immediately can effectively say, right we notice a buffer overflow. We know it's in this function, this syscall, whatever. So we'll immediately now generate a corresponding Tetragon policy that attaches to a pre-hook on this syscall and ensures that that buffer overflow simply can't occur. So I really do think we're quite close to having that loop in place where it simply can generate policies as it finds issues that we're continuously reconfiguring a live kernel which effectively says these inputs are validated or these inputs aren't valid so we're simply not going to allow them. And that is really what we're kind of doing and talking a lot about at Cisco at the moment.

One of the things that we announced last year was Tetragon on Cisco switches. And the idea there being that Tetragon will monitor the control plane behavior of a Cisco switch and ensure who is logging onto a Cisco switch, who is doing what with a Cisco switch, who is trying to elevate their privileges and things like that. But we're wanting to move that more into the thing that you desired earlier, which is that loop of exploits, critical vulnerabilities are found, automagically, I like that term. A resulting policy is automatically generated and applied, and as soon as the issue is actually found, the systems are automatically protected against it. And we're getting there I think.

Olimpiu Pop: That sounds quite interesting, especially when you're talking about Cisco servers because that's a little bit more core infrastructure. And I'm thinking that probably a better place to have those kind of things are small offices and also households because in the end, I don't know how often, even in my place, how often I do the firmware update because it's quite problematic and then that's something that it's definitely will help a lot. And especially that the place where the cybercrime is growing the most is on the household front because people are just not that keen on getting protected or they even don't know about it. But I think we just got to the big guns with everything that Tetragon can do and eBPF can do.

One of the things that probably is closer to most of the other developers is probably observability. Because one of the promises that eBPF has is that you can properly observe your applications without the need of actually instrumenting your code. Because that was one of the biggest problems from my point of view. You had an application going and then you wanted to see what happens and then you had to instrument it. So that meant extra libraries, so on so forth and go to the whole cycle, deploy it and then see what actually happens. And then there was also the question what is the amount of time extra added to each call. And from the way how I see it, eBPF can do that for you.

Observability Without Code Instrumentation [32:59]

Dan Fineran: Yes. So there is, I mentioned kprobes and syscalls and things like that, there are uprobes which are user probes. Which will allow us to attach our eBPF program to userland applications, that's system libraries and things like that. And through all of those different areas that we can connect to, we can profile a variety of different areas in terms of where a program is interacting with things, what it's actually seeing, what it's doing, what the results getting back are actually looking like, time stamp all of that. And I think another area which is going to be very interesting for a lot of people moving forward, it's observability but it's also scheduling of workloads as well can now be added as an alternative to the Linux scheduler.

So you have multiple programs actually running and our eBPF code can actually do the scheduling, so we can effectively say these programs have a higher priority, whenever we need to reschedule the next workload in, these should be given a much higher priority, they're always at the front of the queue to go back on the schedule to go back on the CPU and things like that as well. So in terms of performance and things like that, as mentioned, we can hook in user level, we can hook into our programs interact with the actual kernel itself through syscalls and fentries and kprobes. We can see every library call that our program is making through uprobes. We can see all of the data that it's sending to other userland libraries and backwards and things like that as well. So we can get such observability into absolutely everything that is doing and control in terms of how the kernel is actually running our program as well.

Olimpiu Pop: Okay, just to summarize, if we look at it we start from the way the Linux is based. You have the kernel space where all the magic is happening under the hood and then you have the user space where the applications are running actually. And eBPF provides probes for both spaces. One of them for the kernel space we have kernel probes as you mentioned before, kernel return probes which are actually looking at the return of what actually kernel calls happened. So that allows us to have both places and then there are the trace points which are allowed to see how things are happening in different type of events.

And more or less it's the same category for the user space, it's uprobes that allows us to see dynamically what happens, there are returns and then you have statically defined tracing again. So pretty much the same thing but for applications at these points that we can use for observability. Probably there are a couple of misconceptions about eBPF maybe you can give us a summary what some people think it is because I suppose you heard quite a few.

Misconceptions and the Learning Curve [35:45]

Dan Fineran: As I have mentioned, getting into writing eBPF is not for the faint-hearted. It is getting easier, but because of the strict rules for what we want to allow to run in a Linux kernel, that typically means eBPF programs, you can only do certain levels of complexity with an eBPF program. You're not going to be able to run everything within an eBPF program, although pretty sure I did see something where somebody was trying to run Doom in eBPF which I'm not sure is technically possible yet but that seems to be the benchmark for doing crazy things with computers. But as the verifier is improving, a lot of the rules that we had with eBPF programs are being relaxed. So eBPF programs are being able to get bigger, are being able to get more complex. You still can't do everything within an eBPF program, some things you will need to use userland for, but it is maturing at a very fast pace and because of all of the improvements that we're seeing we're just seeing kind of massive adoption.

The main thing is you just can't do everything with eBPF as of today, but it's still meeting pretty much most people who need to use eBPF is still meeting pretty much all of their requirements at the moment, and are saying it is improving and growing in terms of functionality. That's kind of the main one really that a lot of people kind of have pushed back on. It's difficult to get going with eBPF and you may find it limiting if you're trying to do something that's incredibly complex with eBPF because programs can only be so big, loops can only go on for so long because the verifier will need to unroll them to make sure that they actually end. And I think a lot of people do struggle with the verifier.

One of the big issues that you will find is when you write code and it's compiled and then you try and load it, the verifier will typically say, this code cannot be attached because, and then just a crazy hex string. And it's like well I don't know what that means. So the output of the verifier can be quite hard to understand a little bit. But there is work on improving that sort of feedback loop and things like that as well. I think for me really it's just the barrier to entry is quite tough. Being part of the community I do my best to try and educate people and provide examples and things like that and I'd like to hope people are getting more and more into it. And we're certainly seeing more and more eBPF related projects. We're certainly seeing a lot more kind of adoption and a lot more people just kind of wanting to get their feet wet with writing their first eBPF programs. Liz Rice has put out a newer version of her book recently as well I think so there's multiple resources out there to help people over that initial tough learning curve to get going.

Olimpiu Pop: Okay, is there anything else that I should have asked you, but I failed to do so?

Dan Fineran: No, I guess not. I mean, you know eBPF is open source, it's part of the Linux kernel, as well as the Windows part of it is also open source as well. So this is a community thing. If you want to get into it, you know, there are communities around all of that, there is an eBPF community Slack. So if you want to learn more, those are all good places to start. We also have labs available as well, so you can actually, if you go to isovalent.com/labs, you can spin up a free lab in a web browser and play with some eBPF code and things like that to try and get your head around it. There are more and more projects that are based on eBPF now. But to use those projects, you typically don't need to know eBPF. Like I drive a car but I have no idea how the motor works. So use the tooling and try not to worry too much about eBPF unless you are one of the personas that is going to want to write your own observability or your own clever networking or enforcement and things like that.

Olimpiu Pop: So eBPF is a technology that is very useful to provide more observability and security on Linux kernel and lately also on Windows. But it's not for everybody. You can use tools that are based on eBPF and if you are really curious probably it will be more than enough to just play on the online playground where you can see more, but it's not for the faint hearted, it's probably something that not many of the developers will touch.

Dan Fineran: I mean I want everybody to play with eBPF, but I just the main thing really I wanted to stress more was if you see a tool based upon eBPF, more often than not you won't need to know that eBPF in order to use that tool. That's kind of what I wanted to stress. I don't want people to suddenly think, oh I'm not going to use this tool because I don't know eBPF. Like that's usually not a requirement of the tooling, it's just what's actually powering it. But I want people to learn eBPF, I want people to play with it and understand it and that's kind of the flip side of it I suppose.

Olimpiu Pop: Okay, so let me ask you a different thing because you opened Pandora's box there. You are working for the car factory right? And now the question is somebody's just pitching an eBPF based tool to me. What should I ask, what should I be careful when asking those people that are maintaining it before I actually buy in and start running it.

Evaluating Open Source Projects [41:00]

Dan Fineran: I mean that's always a tough one. I think judging the health of an open source project generally is quite tough. Typically there are a couple of indicators you can look for, which is how often is this project receiving updates, what does its contribution base look like. Is it one person in his basement, is it a big company with hundreds of developers, kind of trying to get an idea of how many people are actually using it, because if it has a massive user base then typically more and more people will be contributing to it and things like that as well. I think and this is going to be a tricky line to walk, we are seeing a lot of AI generated submissions in various projects. There are some models which apparently now understand the eBPF code a little bit more. Now that's great, like I want more code being written.

The flip side I think though is if I'm going to adopt a project that's using eBPF, well just a project, it doesn't make a diff... eBPF is kind of irrelevant in this space, it's more just a general open source project issue, is the people that've created this project, do they understand how the code works? Are they going to be able to support it, you know I mean models are decommissioned every now and then and things like that. If tokens become too expensive like they may do, are they going to be able to support this project moving forward? If it breaks or a bug is found, who's going to fix the bugs and things like that as well.

So I worry a little bit about the disconnect between developers themselves who kind of own the projects and who's actually writing the code for the projects and the gulf between understanding the code in the project and supporting it moving forward. Because people are creating stuff and throwing it over the fence at a crazy rate at this point and like I'm seeing a lot of cool new projects, but like with no voice to speak to you about how's it going to move forward, what's the lifecycle of this project going to look like and things like that. So it's a very interesting time at the moment I'd say.

Olimpiu Pop: Okay, so the short answer will be do your own due diligence and if you have any doubts just go to the next shop and see what's there.

Dan Fineran: If you have doubts, raise an issue, ask the question. Like that the whole point of kind of the open source is the back and forth and hopefully the community so yes.

Olimpiu Pop: Thank you Dan Fineran. Thank you for your time.

Dan Fineran: Excellent pronunciation. Thank you very much for your time as well, this was fantastic.

Mentioned:

About the Author

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT