Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Learning eBPF for Better Observability

Learning eBPF for Better Observability

Key Takeaways

  • Focus on identifying a use case or problem to solve with eBPF, for example tracing application syscalls to identify open files.
  • The landscape is rapidly changing, and new tools and libraries are added. Better use existing "proven stable" tutorials, even when they are a bit older.
  • High-level development libraries require proficiency in Go, Rust or C/C++. This is due to the nature of compiling the code into bytecode for the Kernel.
  • You don’t need to understand the low-level Kernel code to learn eBPF. Strengthening your knowledge can help with advanced development and debugging your own eBPF programs later.
  • Automated testing of eBPF programs in CI/CD pipelines is complicated, and needs more standalone test possibilities without a running Kernel.

This article shares insights into learning eBPF as a new cloud-native technology which aims to improve Observability and Security workflows. The entry barriers can feel huge, and the steps to using eBPF tools to help debug in production can be many. Learn how to practice using the tools, and dive into your own development. Iterate on your knowledge step-by-step, and follow-up with more advanced use cases later. Lastly, we will discuss ways to automated development in CI/CD and its challenges.

How to get started with eBPF?

I first heard about eBPF in 2021ish when it came around with Observability topics, and I could not really make sense out of it at first. The descriptions claim a new way to collect event data and help with Observability, and also security observability and enforcement.

Actually, I learned at a later point that Falco uses eBPF to inspect container activities in Kubernetes. My learning history involved seeing Falco as cloud-native security tool, and not questioning the underlying technology. The "Hacking Kubernetes" book helped refine and complete my learnings about container runtime, eBPF and security enforcement.

The eBPF day at KubeCon EU 2022, and later the eBPF Summit event, helped shed light here. The learning strategy is similar to everything in technology: listen, take notes, and don’t understand everything yet.

Attending talks and reading articles often brings up a pattern of terms to recognize: eBPF, BPF, bcc, bpftrace, and iovisor are terms I remembered immediately. Brendan Gregg’s blog was mentioned everywhere too.

A community meetup with barcamp-style pitched talks enabled me to ask, "How to get started with eBPF?" Three slides kicked off a discussion about how it works, verified the knowledge together, and thought about use-cases. At eBPF Summit, there was a Capture-the-Flag environment to start learning which allowed me to stop and explore the challenges hands-on. Afterall, I settled on collecting all eBPF resources on my public learning platform - and decided to learn in public, with all mistakes, misunderstandings and problems made on the way.  

Kernel development sounds hard, and can be an entry barrier into understanding and getting started with eBPF. Changing the approach to using tools and libraries that leverage eBPF, with production use cases - debugging in production for example - greatly helped me learn and iterate. A general understanding of Linux operating systems, resource handling, and troubleshooting is helpful too.

Higher level presentations and the explainer images of can help with the general understanding of the eBPF architecture. I really like the explanation by Brendan Gregg:

"eBPF does to Linux what JavaScript does to HTML. (Sort of.) So instead of a static HTML website, JavaScript lets you define mini programs that run on events like mouse clicks, which are run in a safe virtual machine in the browser. And with eBPF, instead of a fixed kernel, you can now write mini programs that run on events like disk I/O, which are run in a safe virtual machine in the kernel. In reality, eBPF is more like the v8 virtual machine that runs JavaScript, rather than JavaScript itself. eBPF is part of the Linux kernel."

eBPF was added to the Linux kernel to enable the small sandboxed programs. This also solves the problem with the stable kernel requirement with less innovation possible, while eBPF programs can help extend and drive innovation without blocking the Kernel development.

Use cases with eBPF include high-performance networking and load balancing, tracing applications, and performance troubleshooting. Additionally, fine-grained security observability and application/container runtime security enforcement come to mind.

Writing an eBPF program is hard; the Kernel expects bytecode which isn’t efficient to write by hand. Therefore, abstraction layers are required, including compilers that can generate the bytecode from more high level programming languages. Tools that are often mentioned in this context are Cilium, bcc, and bpftrace. The verification of eBPF programs happens with just-in-time compilation from bytecode into the machine specific instruction set. This makes static verification in CI/CD workflows harder. More on that later.

After getting to know the requirements, the real question is, what are practical examples to try and learn, and then dive deeper into actual source code?

Really get started: a playground

Brendan Gregg’s "Learn eBPF Tracing: Tutorial and Examples" blog is a great starting point. Different attempts and routes always came back here for self-paced learning. It’s a great strategy to try different tools on the command-line and test their effectiveness, before diving deeper into libraries and how eBPF programs are built.

Note: Liz Rice’s Learning eBPF book will help reduce the entry barrier once more, published in March 2023.

A recommended start is to choose a Linux distribution with a recent kernel >= 4.17, for example Ubuntu 22.04 LTS. Use a local virtualization method, or spawn a VM on your preferred cloud provider. The following example uses the Hetzner Cloud CLI to spin-up a new Ubuntu VM:

$ hcloud server create --image ubuntu-22.04 --type cx21 --name ebpf-chaos

Depending on your needs to re-create the setup, consider writing Ansible playbooks or a script that repeats the installation steps. This can be helpful to share with teams to learn specifically about the tools and libraries used in your environment. The tools and ideas discussed in this article are available as Ansible examples in this GitLab project. There are default tools to be installed (git, wget, curl, htop, docker), and more specific use cases for eBPF, chaos experiments, and observability.

The next sections discuss examples of eBPF tools. To build and install them, the Linux kernel headers and additional dependencies are required. An additional step on Ubuntu 22 LTS was to enable the DDebs repository to get access to debug symbols, next to a full compiler toolchain. This Ansible configuration for eBPF tools describes the installation steps in much detail. You can inspect the Git history to follow the learning steps, and errors made on the way. The following sections focus on running the tools, and explaining their use cases.

Tracing syscalls

You probably have used the strace command to trace the syscalls of a running binary application to see whether files are opened, permission errors, etc. The tutorial blog from Brendan Gregg recommends starting with the bcc toolchain which provides the execsnoop command. It can trace the exec() syscall. One way to easily test this is either an open SSH connection, or an additional terminal which executes the command curl

$ execsnoop -t

115.816 curl             879320 879305   0 /usr/bin/curl
118.481 sshd             879322 67197    0 /usr/sbin/sshd -D -R
124.287 sshd             879324 67197    0 /usr/sbin/sshd -D -R

We have learned a new way to trace syscalls. The bcc toolchain provides more practical tools and use cases. From a learning perspective, which other tools can we look at to dive deeper into eBPF?

bpftrace: high level tracing language

Bpftrace provides its own high level tracing language, similar to debugging frameworks such as DTrace. At first glance, the online examples can feel overwhelming, but since we are using a test VM, we can run them, and analyze the language later. Bpftrace allows you to trace more syscalls, for example open(). This method is used to open files, sockets, etc. - generally everything that a process could open, with good or bad intentions. It can be seen as a more modern approach to the strace command again.

In order to test bpftrace with a predictable example, you can use this minimal C program that opens a file handle to a create a new file (source code):

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main()
  int fd;

  if ((fd=open("ebpf-chaos.txt", O_WRONLY | O_CREAT, 0660)) == -1)
    printf("Cannot open file.");


Compile the C program with the gcc compiler, and run it after starting the bpftrace command. If the command fails to run on Ubuntu 22 LTS, install the debug symbols from the DDeb repository.

$ gcc sim-open-file.c -o sim-open-file
$ chmod +x sim-open-file
$ ./sim-open-file


$ bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

Attaching 1 probe...
sim-open-call /etc/
sim-open-call /lib/x86_64-linux-gnu/

The tracing language allows to hook into specific syscalls. It can be a bit of trial and error to find the correct syscall name. I had to change sys_enter_open to sys_enter_openat to trigger the file open call in the C program. bpftrace -l allows to list all traceable syscalls.

$ bpftrace -l 'tracepoint:syscalls:sys_enter_open*'

The code inside the angle brackets prints the command and the path to the filename onto the terminal. Accessing the filename to print requires reading the C structures code to understand which attributes are available in this context.

The learning curve "aha moment" was with seeing not only the file open and write call, but also loading the library dependencies (stdlib requires libc). The bfptrace tool will be useful to verify that binaries actually load certain libraries, next to using ldd and nm to peek into dependencies and debug symbols.

$ ldd sim-open-call (0x00007ffe42c78000) => /lib/x86_64-linux-gnu/ (0x00007f24c7247000)
    /lib64/ (0x00007f24c747d000)

$ nm sim-open-call | grep open
                 U open@GLIBC_2.2.5

Diving deeper into source code and eBPF programs

The BPF Compiler Collection (BCC) provides examples to learn the data transfer and interaction between kernel and user space. The previous examples only hooked into syscalls and immediately returned. BCC provides kernel instrumentation in C code, and allows the frontend user space applications being written in Python or Lua. The use cases are described as performance analysis and network traffic control - these are great insights, and add to the learning curve for later knowledge verification. Python and C knowledge helps to dive into the examples more easily.

Alternatively, libbpf as a library was recommended during my research, because its bootstrap project provides more demo applications. These provide real world programs that can be used for implementing the first eBPF program yourself. One example is written in Rust and allows you to inspect network traffic, and the packet size, following the XDP specification. eXpress Data Path (XDP) allows to hook into sending/receiving network packets at a high scale, just after the interrupt and before any memory allocation. This can be used to silently drop packets for example too (note the use case for later advanced eBPF program development).  

The user needs to specify the interface number, which resulted in a bit of trial and error again. The interface name with eth0 did not work. The example output originates from a Prometheus server instance running on the same host, generating network traffic from scraping monitoring targets as HTTP endpoints.

$ apt install rustc cargo clang rustfmt

$ git clone
$ cd libbpf-bootstrap/examples/rust
$ cargo build
$ cargo run

$ sudo ./target/debug/xdp 1 #if number

$ sudo tail -f /sys/kernel/debug/tracing/trace_pipe  

      prometheus-660     [001] d.s11 295903.782373: bpf_trace_printk: packet size: 74
      prometheus-659     [000] d.s11 295903.782735: bpf_trace_printk: packet size: 74
      prometheus-659     [000] d.s11 295903.782762: bpf_trace_printk: packet size: 54
      prometheus-671     [001] d.s11 295908.509751: bpf_trace_printk: packet size: 352
      prometheus-671     [001] d.s11 295908.513184: bpf_trace_printk: packet size: 4162
      prometheus-671     [001] d.s11 295908.513218: bpf_trace_printk: packet size: 66
      prometheus-671     [001] d.s11 295908.513295: bpf_trace_printk: packet size: 4162
      prometheus-671     [001] d.s11 295908.513307: bpf_trace_printk: packet size: 66
      prometheus-671     [001] d.s11 295908.513368: bpf_trace_printk: packet size: 1630

After building and running more examples, it was not exactly clear whether copying or modifying the source code would be a good strategy. How to cut down the XDP example to the smallest footprint possible? Maybe there are better ways to get started with writing eBPF program code step-by-step and increase the learning curve experience.

[Click on the image to view full-size]

eBPF Program Development, learning more use cases

It is important to understand the basics with BPF and eBPF before diving into developing your own programs. eBPF is an extended version of the Berkley Packet Filter (BPF), and provides an abstract virtual machine running in the Linux kernel, running eBPF programs in a controlled environment. Fundamentally, the "old" BPF standard in the Linux Kernel can be called "classic BPF" to distinguish it from eBPF.

Start with trying the bcc tools, run bpftrace and identify the use cases that help SREs and DevOps engineers during daily business and incidents. This can be tracing a program start/exit, looking into control groups (cgroups), observing TCP connections, inspecting network interfaces, and much more. The suggestion is to keep the use cases simple to ensure a steady learning curve.

Find abstractions in the form of libraries and toolchains, after verifying the basic knowledge about eBPF, and defining the use cases. Modern compilers and libraries are available for Go, Rust and C/C++. It is recommended to learn the basic programming language first before deciding to write an eBPF program in it. From my own experience, learning Rust with knowledge of C++ or Python is a possible path forward. It also helps avoiding runtime errors with memory handling, and can be a more safe approach compared to C/C++ eBPF programs.

Cillium implements its eBPF functionality in an Open-Source library for Golang. In addition to learning to write your own eBPF programs, the library provides more use case examples: attach a program to entry/exit, count egress traffic packets, and inspect network interfaces (note the term XDP for later learning). The XDP program can be built with Go compiler toolchain, and accepts the interface name as command line argument. It uses maps to persist the counting the network packets for specific IP addresses; a good use case for any type of network interface on a Kubernetes node, inspecting container traffic, or tracking traffic on embedded hardware.

[Click on the image to view full-size]

If you feel more comfortable with writing Rust code, the aya-rs maintainers provide a Rust developer toolchain, including a book with tutorials to learn. The examples in the book implement a similar XDP network traffic scenario which can be run directly from the Cargo build chain, making the development process more efficient.

$ git clone aya-rs-book
$ cd examples/xdp-hello
$ cargo install bpf_linker
$ cargo xtask build-ebpf
$ cargo build

$ RUST_LOG=info cargo xtask run

The example program does not keep track of IP addresses and their packet count, but this can be an excellent learning challenge to mimic the behavior from the Go library example.

[Click on the image to view full-size]

Additional practical use cases for aya-rs are continuous profiling - the Polar Signals developers adopted the Rust library into the Parca agent for automated function callstack analysis and better memory safety (slides from eBPF day at KubeCon EU 2022, Pull Request).

There are different ways to start developing eBPF programs. Keep in mind that the architecture follows a bytecode compiled eBPF program loaded into the Kernel, and will require a user space "collector" or "printer". The communication happens through sockets or file handles.

Testing and verifying eBPF programs

Testing eBPF programs automatically in CI/CD pipelines is tricky, because the kernel verifies the eBPF programs at load time and rejects potential unsafe programs. Tests will require a new virtual machine sandbox that loads the eBPF program, and simulates the surrounding behavior for the Kernel and eBPF program. The requirements include triggering events that again trigger hooks where the eBPF program code subscribed to. Depending on the purpose, this involves different Kernel interfaces and syscalls (network, file access, etc.). Creating a standalone unit test mock is hard, and would need the developer to simulate a running Kernel.

There are attempts to move the eBPF verifier outside of the kernel, and allow testing eBPF programs in CI/CD. Meanwhile, loading an eBPF program in CI/CD requires a running Linux VM with a CI/CD runner/executor which has elevated permissions. On Ubuntu 22 LTS, loading unprivileged programs has been disabled by default, and may need to be re-enabled by running sudo sysctl kernel.unprivileged_bpf_disabled=0.

Continuous testing in CI/CD

In order to provision CI/CD runner environments for continuous testing, it is recommended to spin up a Linux VM with Ansible/Terraform, install the CI/CD runner, register it to the CI/CD server and prepare the requirements for loading and running eBPF programs. This is a common pattern across different vendors. The following example uses Ansible to install and register GitLab Runner to a project which then can be used to build and run eBPF programs. The GitLab Runner registers the tag ebpf and will only execute CI/CD jobs that use this tag.


- name: GitLab Runner for eBPF
  hosts: all
    ansible_python_interpreter: /usr/bin/python3
    - name: Get GitLab repository installation script
        url: ""
        dest: /tmp/
        mode: 0744
    - name: Install GitLab repository
      command: bash /tmp/
        creates: "/etc/apt/sources.list.d/runner_gitlab-runner.list"
      become: true
    - name: Install GitLab Runner
        name: gitlab-runner
        state: present
        allow_downgrade: true
      become: true

    - name: Allow the gitlab-runner user to run any commands as root with sudo -u root
        name: gitlab-runner sudo
        state: present
        user: gitlab-runner
        runas: root
        commands: ALL # Review this for production usage. For demos, it is enabled, and forked MR CI/CD builds won't run.

The registration requires `gl_runner_registration_token` variable from the GitLab project settings for CI/CD Runners.

- name: GitLab Runner for eBPF - register once
  hosts: all
    ansible_python_interpreter: /usr/bin/python3
    - name: "Configure GitLab Runner (running to populate config.toml)"
      command: >
        gitlab-runner register
        --url ""
        --executor "shell"
        --tag-list ebpf
        --registration-token="{{ gl_runner_registration_token }}"

The GitLab runner will be visible in the project settings in "CI/CD > Runners".

Test a Rust-based eBPF program in CI/CD

Let’s try the CI/CD workflow with an actual eBPF program, using the aya-rs Rust library template as a demo example. First, install Rust and the required eBPF libraries locally on the Linux VM to verify everything is working.

curl -sSf | sh
source "$HOME/.cargo/env"

rustup install stable
rustup install nightly

rustup default stable
rustup toolchain add nightly
rustup component add rust-src --toolchain nightly

# required for cargo-generate
apt -y install libssl-dev

cargo install cargo-generate
cargo install bpf-linker
cargo install bindgen-cli

Next, generate a template skeleton tree for creating a demo program using the XDP (eXpress Data Path) type. Inspect the code in ebpf-chaos-demo-xdp/src/ and update the network interface name in case. Then build and run the program, and set the log level to info (or debug).

cargo generate --name ebpf-chaos-demo-xdp -d program_type=xdp

RUST_LOG=info cargo xtask run

The demo code consists of two parts: the kernel space eBPF program in ebpf-chaos-demo-xdp-ebpf/src/ and the user space program in ebpf-chaos-demo-xdp/src/ which loads the eBPF program, and attaches it to the Kernel trace point. To build only the eBPF program, you can invoke the `build-ebpf` xtask and inspect the byte code using the llvm-objdump command:

cargo xtask build-ebpf

llvm-objdump -S target/bpfel-unknown-none/debug/ebpf-chaos-demo-xdp

The full source code is located in this GitLab project, and can be tested with a GitLab CI/CD pipeline. Note that it needs to install the Rust toolchain into the runner’s environment once. Subsequent pipeline runs will use configured caches. There are three jobs in the pipeline:

  • install-deps prepares the Rust environment, which requires the CARGO_HOME variable specified to the runner’s project directory.
  • aya-rs-xdp-build-ebpf builds the Kernel eBPF program, and runs the llvm-objdump command.
  • aya-rs-xdp-run runs the user space program, requiring sudo privileges. It puts the command into background, captures stdout, sleeps for 60 seconds, and then uses pkill to kill the xtask command, to finish by printing the captured output.

Enhancing the output analysis and thinking of more test reports from running the eBPF program is left as an exercise for the reader.

# eBPF GitLab Runner required for this project
# Note: Various commands need sudo/root access on the Linux host, see ansible-config/.
# By default, for security reasons, CI/CD pipelines are not run from forks in the parent project.
# See  
    - ebpf

  - pre
  - build
  - run

  RUST_LOG: "info"
  RUNTIME: 300 # set to >= 5*60 = 300s because cargo xtask run also compiles the binary first

# These steps should not take long after subsquent runs on the Linux VM
  stage: pre
    - sudo apt install libssl-dev # required for cargo-generate on Ubuntu 22 LTS
    - curl -sSf -o
    - sh -y --profile default
    - source "$HOME/.cargo/env"
    - rustup install stable
    - rustup install nightly
    - rustup default stable
    - rustup toolchain add nightly
    - rustup component add rust-src --toolchain nightly
    # 'cargo install' is not idempotent. --force takes too long. Treat an error as 'ok, installed' here.
    - cargo install cargo-generate bpf-linker bindgen-cli || true

  stage: build
    - cd examples/ebpf-chaos-demo-xdp
    - source "$HOME/.cargo/env"
    - cargo xtask build-ebpf
    - llvm-objdump -S target/bpfel-unknown-none/debug/ebpf-chaos-demo-xdp

  stage: run
  # We need to send the cargo xtask run command into the background, capture stdout, kill it after a defined interval, and generate a test report for CI/CD
    - cd examples/ebpf-chaos-demo-xdp
    - source "$HOME/.cargo/env"
    - rm ${CI_PROJECT_DIR}/
    - nohup cargo xtask run > ${CI_PROJECT_DIR}/nohup.out 2>&1 & echo $! > ${CI_PROJECT_DIR}/
    - sleep $RUNTIME
    - kill -s TERM `cat ${CI_PROJECT_DIR}/` || true
    - rm ${CI_PROJECT_DIR}/  
    - cat "${CI_PROJECT_DIR}/nohup.out"
    - echo "Finished running eBPF program. TODO - analyze the output more."
    expire_in: 30 days
      - ${CI_PROJECT_DIR}/nohup.out

The screenshot shows the job that runs the eBPF programs, with the captured log output from capturing network packets. Depending on the changes made to the source code, the output will change and can be tested. An idea would be to summarize the captured packets in a machine readable format, and create a summary table when terminated. This is easier to consume and understand in CI/CD, as well as on the command line.

[Click on the image to view full-size]

The method of putting the process into the background may not wake it up properly, which requires a better signal handling implementation, for example. It is far from perfect; you can see my learning history in this merge request. There is probably a better way to build a release binary, and spawn that with supervisorctl or systemd - this is the next learning step. The termination and unloading process has been tricky to evaluate. The following snippet implements proper signal handling, but does not always unload the registered XDP links from the running Kernel. An alternative method would be spawning a fresh Linux VM for each CI/CD run to avoid these repeatable failures. The disadvantage is that you will need a remote cache for the Rust builds to avoid long CI/CD build run times.

// Implement signal handling for CTRL+C and SIGTERM
use tokio::signal::unix::{signal, SignalKind};


    let program: &mut Xdp = bpf.program_mut("ebpf_chaos_demo_xdp").unwrap().try_into()?;
    program.attach(&opt.iface, XdpFlags::default())
        .context("failed to attach the XDP program with default flags - try changing XdpFlags::default() to XdpFlags::SKB_MODE")?;

    // Implement signal handling for CTRL+C (SIGINT) and SIGTERM
    // CTRL+C can be used for terminal tests
    // SIGTERM will be sent from CI/CD jobs to the background process
    let mut sigterm = signal(SignalKind::terminate())?;
    let mut sigint = signal(SignalKind::interrupt())?;

    tokio::select! {
        _ = sigterm.recv() => { println!("SIGTERM shutting down") }
        _ = sigint.recv() => { println!("SIGINT shutting down") }

    // Destroying the bpf object will detach and cleanup the loaded program.
    // Debug with 'bpftool link show'

Additional to-dos for CI/CD and DevSecOps workflows

The remaining challenge is to extend the eBPF programs to generate test reports, and create runtime test environments, i.e. by running a network traffic test cycle with curl, and verifying the exact packet size output. Also, the architecture is important - either an eBPF program gets loaded into the Kernel, and has a user space application that reads its results, or the eBPF program is a single binary attaching its probes directly. The latter requires sending the program into the background in CI/CD jobs, capturing its output, performing tests, and later combining the test reports; a procedure that leaves a lot to be desired for DevSecOps workflows, but I’m sure we will get there in the near future.

Code coverage is another new territory with testing eBPF programs. There are not many tools available that help developers understand which path the code took while running in the Linux kernel, which code regions are affected, and which code isn’t covered. bpfcov was created by Elastic engineers to help solve this problem, and allow developers to understand the code execution path of eBPF programs. Running automated code quality and security scans in CI/CD can also be challenging: How to determine a programming mistake that could slow down Kernel operations? It would be interesting to see if continuous profiling for eBPF programs could work (using eBPF itself, such as the Parca project). There are also programming patterns that might circumvent the Kernel verifier, and become a software supply chain security attack with injecting malicious code into the released eBPF program, from a contributed pull or merge request. This requires DevSecOps workflows to ensure that security measurements are put in place. AI might be helpful too.


eBPF is a new way to collect Observability data; it helps with network insights and security observability and enforcement. In order to get the best libraries, tools and frameworks, we need to learn in public together to lower the knowledge barriers, and enable everyone to contribute. From testing existing tools to step-by-step tutorials to writing eBPF programs, there is a long way to go and collaborate. eBPF program testing and verification in CI/CD is a big to-do, next to bringing all ideas upstream and lowering the entry barrier to using and contributing to eBPF open-source projects.

To get started, spin up a Linux VM, use scripts/Ansible for reproducible setups, and test and develop away. Take a step back when names and Kernel technologies are blocking the learning progress - you don’t need to understand eBPF in its fullest. A general understanding of the data collection helps when production breaks. Last but not least, a practical tip: when debugging eBPF programs, consider testing on multiple distributions to avoid hitting a Kernel specific bug.

About the Author

Rate this Article