eBPF: The Linux Superpower
- lesson
- ebpf
- bpf-history
- tracing
- networking
- security
- observability
- cilium
- l2 ---# eBPF: The Linux Superpower Nobody Told You About
Topics: eBPF, BPF history, tracing, networking, security, observability, Cilium Level: L2 (Operations) Time: 60–75 minutes Prerequisites: Basic Linux understanding helpful
The Mission¶
You need to trace every file open on a production server — without installing anything, without restarting anything, with less than 2% overhead. Or you need to filter network packets at line rate — millions per second — without touching iptables. Or you need to enforce security policies in the kernel without writing a kernel module.
All of this is eBPF. It lets you run custom programs inside the Linux kernel — safely, dynamically, at production speeds. It's the biggest change to Linux observability and networking since, well, ever.
# Right now, try this (if you have bpftrace installed):
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opened %s\n", comm, str(args->filename)); }'
# → nginx opened /var/log/nginx/access.log
# → python3 opened /opt/myapp/config.yaml
# → postgres opened base/16384/2619
# Every file open on the entire system. In real time. <2% overhead.
From BPF to eBPF: A Brief History¶
1992: Steven McCanne and Van Jacobson at Lawrence Berkeley Lab create BPF (Berkeley
Packet Filter) for tcpdump. BPF is a tiny virtual machine in the kernel that filters
packets efficiently — instead of copying every packet to userspace, BPF runs a filter program
inside the kernel and only passes matching packets up.
2014: Alexei Starovoitov (at that time at PLUMgrid, later Facebook) extends BPF into eBPF (extended BPF). The "extended" part: general-purpose registers, a JIT compiler, maps (key-value storage), helper functions, and the ability to attach to almost any kernel event — not just packets.
2016-present: eBPF explodes. Cilium (networking), Falco (security), bpftrace (tracing), Pixie (observability), Tetragon (runtime security) — all built on eBPF. The Linux Foundation creates the eBPF Foundation (2021). Facebook runs eBPF on every server in production.
Mental Model: Think of the kernel as an operating system you normally can't modify without recompiling and rebooting. eBPF is like browser JavaScript — it lets you run custom code inside the kernel, safely sandboxed, verified by the kernel before execution, and with no risk of crashing the host.
What eBPF Can Do¶
1. Observability (tracing, profiling)¶
Attach to any kernel or userspace function and collect data:
# Trace every file opened by any process (bpftrace)
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
printf("%s opened %s\n", comm, str(args->filename));
}'
# → nginx opened /var/log/nginx/access.log
# → python3 opened /opt/myapp/config.yaml
# → postgres opened /var/lib/postgresql/16/main/base/16384/2619
# Trace TCP connections (with source and destination)
bpftrace -e 'kprobe:tcp_connect {
printf("%s connecting to %s:%d\n",
comm, ntop(((struct sock *)arg0)->__sk_common.skc_daddr),
((struct sock *)arg0)->__sk_common.skc_dport);
}'
# CPU flame graph in 30 seconds (with perf)
perf record -F 99 -a -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
Overhead: typically <2%. Compare to strace (100x+) or Valgrind (10-50x). eBPF is the only tracing technology safe for production.
2. Networking (Cilium replaces iptables/kube-proxy)¶
Cilium uses eBPF to implement Kubernetes networking, bypassing iptables entirely:
Traditional kube-proxy:
Packet → netfilter → iptables rules (linear scan) → DNAT → route → deliver
Cilium with eBPF:
Packet → eBPF program (hash lookup) → DNAT → deliver
(no iptables, no netfilter overhead)
At scale, iptables becomes a bottleneck. 5,000 Kubernetes Services × 10 endpoints each = 50,000 iptables rules. Adding or removing a Service rebuilds the entire rule set. Cilium's eBPF approach uses hash maps — O(1) lookup instead of O(n) linear scan.
3. Security (runtime enforcement)¶
eBPF programs can enforce security policies at the kernel level:
# Tetragon: detect and block suspicious process execution
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
spec:
kprobes:
- call: "sys_execve"
args:
- index: 0
type: "string"
selectors:
- matchArgs:
- index: 0
operator: "Equal"
values: ["/usr/bin/curl"]
matchActions:
- action: Sigkill # Kill any process that tries to curl
This runs in the kernel — no userspace process can bypass it. Traditional security tools (AppArmor, SELinux) use static profiles. eBPF enables dynamic, programmable policies.
The Safety Model¶
"Running custom code in the kernel" sounds terrifying. eBPF's safety model prevents crashes:
- Verifier: Before any eBPF program runs, the kernel verifier checks it:
- No infinite loops (bounded execution)
- No out-of-bounds memory access
- No null pointer dereferences
-
No unauthorized kernel memory access
-
JIT Compiler: Verified programs are compiled to native machine code for performance.
-
Sandboxing: eBPF programs can only call approved kernel helper functions. They can't call arbitrary kernel functions, allocate unbounded memory, or modify arbitrary state.
# If your eBPF program has a bug, the verifier catches it:
$ bpftool prog load bad_program.o /sys/fs/bpf/test
Error: failed to load program: permission denied
Verifier:
0: (85) call bpf_probe_read_kernel#113
...
R1 type=inv expected=fp ← invalid pointer, program rejected
Practical eBPF Tools¶
You don't need to write raw eBPF programs. These tools provide high-level interfaces:
| Tool | What it does | Example |
|---|---|---|
| bpftrace | One-liner tracing scripts | bpftrace -e 'kprobe:vfs_read { @[comm] = count(); }' |
| bcc tools | Pre-built eBPF tools (60+) | execsnoop, opensnoop, tcpconnect, biolatency |
| Cilium | K8s networking + security | Replaces kube-proxy with eBPF |
| Tetragon | Runtime security enforcement | Block suspicious syscalls in kernel |
| Falco | Runtime threat detection | Alert on suspicious container behavior |
| Pixie | Auto-instrumentation | Request-level observability without code changes |
# BCC tools — pre-built, ready to use
# Who's executing what?
execsnoop-bpfcc
# → TIME PID PPID COMM
# → 14:23 5678 1234 curl
# → 14:23 5679 5678 bash
# What files are being opened?
opensnoop-bpfcc
# → PID COMM FD ERR PATH
# → 5678 nginx 3 0 /var/log/nginx/access.log
# TCP connection tracing
tcpconnect-bpfcc
# → PID COMM SADDR DADDR DPORT
# → 5678 python3 10.0.1.50 10.0.2.100 5432
# Block I/O latency histogram
biolatency-bpfcc
# → usecs count distribution
# → 0→1 523 ████████████████
# → 2→3 140 ████
# → 4→7 42 █
# → 8→15 12
# → 16→31 5
# → 32→63 3 ← these are the outliers causing latency
eBPF vs Traditional Approaches¶
| Approach | Overhead | Requires | Scope |
|---|---|---|---|
| Application logging | Varies | Code changes | Only what you instrument |
| strace | 100x+ | ptrace | One process, all syscalls |
| perf | <2% | Kernel support | CPU profiling, tracepoints |
| eBPF (bpftrace/bcc) | <2% | Kernel 4.15+ | Any kernel event, any process |
| tcpdump | Moderate | Root | Network packets only |
| Cilium (eBPF) | Lower than iptables | Kernel 4.19+ | All K8s networking |
Flashcard Check¶
Q1: What is eBPF?
A technology for running sandboxed programs inside the Linux kernel. Used for observability (tracing), networking (Cilium), and security (Tetragon/Falco).
Q2: Why is eBPF safe despite running in the kernel?
The verifier checks every program before execution: no infinite loops, no out-of-bounds access, no unauthorized memory access. Programs that fail verification are rejected.
Q3: bpftrace vs strace — when to use which?
strace for quick debugging on dev machines (easy, detailed, but 100x overhead). bpftrace for production systems (programmable, <2% overhead, but requires newer kernels).
Q4: Why does Cilium replace iptables for Kubernetes?
iptables uses linear rule scanning — 50,000 rules at 5,000 Services. Cilium uses eBPF hash maps with O(1) lookup. Faster at scale, faster rule updates.
Cheat Sheet¶
Quick eBPF Tracing (bcc tools)¶
| Task | Command |
|---|---|
| Process execution | execsnoop-bpfcc |
| File opens | opensnoop-bpfcc |
| TCP connections | tcpconnect-bpfcc |
| DNS lookups | gethostlatency-bpfcc |
| Disk I/O latency | biolatency-bpfcc |
| Syscall count by process | syscount-bpfcc |
| File system latency | ext4slower-bpfcc 1 (>1ms) |
bpftrace One-Liners¶
# Count syscalls by process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
# Histogram of read() sizes
bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret > 0/ { @bytes = hist(args->ret); }'
# Trace process signals
bpftrace -e 'tracepoint:signal:signal_deliver { printf("%s received signal %d\n", comm, args->sig); }'
Takeaways¶
-
eBPF is the future of Linux observability, networking, and security. It's not experimental — Facebook, Google, Netflix, and most cloud providers run it in production.
-
<2% overhead in production. Unlike strace (100x) or Valgrind (10-50x), eBPF is safe for production tracing. This changes what you can observe.
-
You don't need to write eBPF programs. bcc tools provide 60+ pre-built programs. Cilium, Tetragon, and Falco provide high-level interfaces.
-
Cilium replaces iptables at scale. O(1) hash map lookups instead of linear iptables rule scanning. Significant at 5,000+ Services.
-
The verifier is your safety net. Bad programs are rejected before execution. You can't crash the kernel with eBPF — the verifier won't let you.
Exercises¶
-
Trace file opens with bcc. Install bcc-tools (e.g.,
apt install bpfcc-toolson Ubuntu). Runopensnoop-bpfccin one terminal. In another terminal, open a few files (cat /etc/hostname,ls /tmp). Observe the output showing PID, process name, and file paths. Identify which processes are opening files in the background without your input. -
Count syscalls by process. Run
syscount-bpfcc -d 10to count syscalls over 10 seconds. While it runs, generate some activity (browse files, run commands). Review the output and identify which process made the most syscalls and which syscall was most frequent. Compare this to whatstrace -c -p PIDwould show for a single process. -
Trace TCP connections. Run
tcpconnect-bpfccin one terminal. In another, make several outbound connections:curl https://example.com,dig @8.8.8.8 example.com. Observe the source address, destination address, and port for each connection. Identify any unexpected outbound connections from background processes. -
Measure block I/O latency. Run
biolatency-bpfcc -d 10to collect a histogram of disk I/O latency over 10 seconds. While it runs, generate some disk activity (find / -name "*.conf" 2>/dev/null). Review the histogram output and identify whether any I/O operations fell in the high-latency tail (>1ms). -
Compare eBPF overhead to strace. Pick a busy process (e.g., start
find / -type f 2>/dev/null). Time it without tracing. Time it understrace -c. Then run the same workload whileexecsnoop-bpfccruns in another terminal. Compare the wall-clock times to see the overhead difference between strace and eBPF-based tracing.
Related Lessons¶
- strace: Reading the Matrix — the traditional (slower) approach to syscall tracing
- The Mysterious Latency Spike — eBPF tools for latency diagnosis
- The Container Escape — eBPF for runtime security enforcement