eBPF: The Linux Superpower

lesson
ebpf
bpf-history
tracing
networking
security
observability
cilium
l2 ---# eBPF: The Linux Superpower Nobody Told You About

Topics: eBPF, BPF history, tracing, networking, security, observability, Cilium Level: L2 (Operations) Time: 60–75 minutes Prerequisites: Basic Linux understanding helpful

The Mission¶

You need to trace every file open on a production server — without installing anything, without restarting anything, with less than 2% overhead. Or you need to filter network packets at line rate — millions per second — without touching iptables. Or you need to enforce security policies in the kernel without writing a kernel module.

All of this is eBPF. It lets you run custom programs inside the Linux kernel — safely, dynamically, at production speeds. It's the biggest change to Linux observability and networking since, well, ever.

# Right now, try this (if you have bpftrace installed):
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opened %s\n", comm, str(args->filename)); }'
# → nginx opened /var/log/nginx/access.log
# → python3 opened /opt/myapp/config.yaml
# → postgres opened base/16384/2619
# Every file open on the entire system. In real time. <2% overhead.

From BPF to eBPF: A Brief History¶

1992: Steven McCanne and Van Jacobson at Lawrence Berkeley Lab create BPF (Berkeley Packet Filter) for tcpdump. BPF is a tiny virtual machine in the kernel that filters packets efficiently — instead of copying every packet to userspace, BPF runs a filter program inside the kernel and only passes matching packets up.

2014: Alexei Starovoitov (at that time at PLUMgrid, later Facebook) extends BPF into eBPF (extended BPF). The "extended" part: general-purpose registers, a JIT compiler, maps (key-value storage), helper functions, and the ability to attach to almost any kernel event — not just packets.

2016-present: eBPF explodes. Cilium (networking), Falco (security), bpftrace (tracing), Pixie (observability), Tetragon (runtime security) — all built on eBPF. The Linux Foundation creates the eBPF Foundation (2021). Facebook runs eBPF on every server in production.

Mental Model: Think of the kernel as an operating system you normally can't modify without recompiling and rebooting. eBPF is like browser JavaScript — it lets you run custom code inside the kernel, safely sandboxed, verified by the kernel before execution, and with no risk of crashing the host.

What eBPF Can Do¶

1. Observability (tracing, profiling)¶

Attach to any kernel or userspace function and collect data:

# Trace every file opened by any process (bpftrace)
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
    printf("%s opened %s\n", comm, str(args->filename));
}'
# → nginx opened /var/log/nginx/access.log
# → python3 opened /opt/myapp/config.yaml
# → postgres opened /var/lib/postgresql/16/main/base/16384/2619

# Trace TCP connections (with source and destination)
bpftrace -e 'kprobe:tcp_connect {
    printf("%s connecting to %s:%d\n",
        comm, ntop(((struct sock *)arg0)->__sk_common.skc_daddr),
        ((struct sock *)arg0)->__sk_common.skc_dport);
}'

# CPU flame graph in 30 seconds (with perf)
perf record -F 99 -a -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

Overhead: typically <2%. Compare to strace (100x+) or Valgrind (10-50x). eBPF is the only tracing technology safe for production.

2. Networking (Cilium replaces iptables/kube-proxy)¶

Cilium uses eBPF to implement Kubernetes networking, bypassing iptables entirely:

Traditional kube-proxy:
  Packet → netfilter → iptables rules (linear scan) → DNAT → route → deliver

Cilium with eBPF:
  Packet → eBPF program (hash lookup) → DNAT → deliver
  (no iptables, no netfilter overhead)

At scale, iptables becomes a bottleneck. 5,000 Kubernetes Services × 10 endpoints each = 50,000 iptables rules. Adding or removing a Service rebuilds the entire rule set. Cilium's eBPF approach uses hash maps — O(1) lookup instead of O(n) linear scan.

3. Security (runtime enforcement)¶

eBPF programs can enforce security policies at the kernel level:

# Tetragon: detect and block suspicious process execution
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
spec:
  kprobes:
    - call: "sys_execve"
      args:
        - index: 0
          type: "string"
      selectors:
        - matchArgs:
            - index: 0
              operator: "Equal"
              values: ["/usr/bin/curl"]
          matchActions:
            - action: Sigkill  # Kill any process that tries to curl

This runs in the kernel — no userspace process can bypass it. Traditional security tools (AppArmor, SELinux) use static profiles. eBPF enables dynamic, programmable policies.

The Safety Model¶

"Running custom code in the kernel" sounds terrifying. eBPF's safety model prevents crashes:

Verifier: Before any eBPF program runs, the kernel verifier checks it:
No infinite loops (bounded execution)
No out-of-bounds memory access
No null pointer dereferences
No unauthorized kernel memory access
JIT Compiler: Verified programs are compiled to native machine code for performance.
Sandboxing: eBPF programs can only call approved kernel helper functions. They can't call arbitrary kernel functions, allocate unbounded memory, or modify arbitrary state.

# If your eBPF program has a bug, the verifier catches it:
$ bpftool prog load bad_program.o /sys/fs/bpf/test
Error: failed to load program: permission denied
  Verifier:
  0: (85) call bpf_probe_read_kernel#113
  ...
  R1 type=inv expected=fp  ← invalid pointer, program rejected

Practical eBPF Tools¶

You don't need to write raw eBPF programs. These tools provide high-level interfaces:

Tool	What it does	Example
bpftrace	One-liner tracing scripts	`bpftrace -e 'kprobe:vfs_read { @[comm] = count(); }'`
bcc tools	Pre-built eBPF tools (60+)	`execsnoop`, `opensnoop`, `tcpconnect`, `biolatency`
Cilium	K8s networking + security	Replaces kube-proxy with eBPF
Tetragon	Runtime security enforcement	Block suspicious syscalls in kernel
Falco	Runtime threat detection	Alert on suspicious container behavior
Pixie	Auto-instrumentation	Request-level observability without code changes

# BCC tools — pre-built, ready to use

# Who's executing what?
execsnoop-bpfcc
# → TIME  PID  PPID  COMM
# → 14:23 5678 1234  curl
# → 14:23 5679 5678  bash

# What files are being opened?
opensnoop-bpfcc
# → PID  COMM  FD  ERR  PATH
# → 5678 nginx  3   0   /var/log/nginx/access.log

# TCP connection tracing
tcpconnect-bpfcc
# → PID  COMM      SADDR       DADDR       DPORT
# → 5678 python3   10.0.1.50   10.0.2.100  5432

# Block I/O latency histogram
biolatency-bpfcc
# → usecs  count  distribution
# → 0→1    523    ████████████████
# → 2→3    140    ████
# → 4→7     42    █
# → 8→15    12
# → 16→31    5
# → 32→63    3    ← these are the outliers causing latency

eBPF vs Traditional Approaches¶

Approach	Overhead	Requires	Scope
Application logging	Varies	Code changes	Only what you instrument
strace	100x+	ptrace	One process, all syscalls
perf	<2%	Kernel support	CPU profiling, tracepoints
eBPF (bpftrace/bcc)	<2%	Kernel 4.15+	Any kernel event, any process
tcpdump	Moderate	Root	Network packets only
Cilium (eBPF)	Lower than iptables	Kernel 4.19+	All K8s networking

Flashcard Check¶

Q1: What is eBPF?

A technology for running sandboxed programs inside the Linux kernel. Used for observability (tracing), networking (Cilium), and security (Tetragon/Falco).

Q2: Why is eBPF safe despite running in the kernel?

The verifier checks every program before execution: no infinite loops, no out-of-bounds access, no unauthorized memory access. Programs that fail verification are rejected.

Q3: bpftrace vs strace — when to use which?

strace for quick debugging on dev machines (easy, detailed, but 100x overhead). bpftrace for production systems (programmable, <2% overhead, but requires newer kernels).

Q4: Why does Cilium replace iptables for Kubernetes?

iptables uses linear rule scanning — 50,000 rules at 5,000 Services. Cilium uses eBPF hash maps with O(1) lookup. Faster at scale, faster rule updates.

Cheat Sheet¶

Quick eBPF Tracing (bcc tools)¶

Task	Command
Process execution	`execsnoop-bpfcc`
File opens	`opensnoop-bpfcc`
TCP connections	`tcpconnect-bpfcc`
DNS lookups	`gethostlatency-bpfcc`
Disk I/O latency	`biolatency-bpfcc`
Syscall count by process	`syscount-bpfcc`
File system latency	`ext4slower-bpfcc 1` (>1ms)

bpftrace One-Liners¶

# Count syscalls by process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Histogram of read() sizes
bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret > 0/ { @bytes = hist(args->ret); }'

# Trace process signals
bpftrace -e 'tracepoint:signal:signal_deliver { printf("%s received signal %d\n", comm, args->sig); }'

Takeaways¶

eBPF is the future of Linux observability, networking, and security. It's not experimental — Facebook, Google, Netflix, and most cloud providers run it in production.
<2% overhead in production. Unlike strace (100x) or Valgrind (10-50x), eBPF is safe for production tracing. This changes what you can observe.
You don't need to write eBPF programs. bcc tools provide 60+ pre-built programs. Cilium, Tetragon, and Falco provide high-level interfaces.
Cilium replaces iptables at scale. O(1) hash map lookups instead of linear iptables rule scanning. Significant at 5,000+ Services.
The verifier is your safety net. Bad programs are rejected before execution. You can't crash the kernel with eBPF — the verifier won't let you.

Exercises¶

Trace file opens with bcc. Install bcc-tools (e.g., apt install bpfcc-tools on Ubuntu). Run opensnoop-bpfcc in one terminal. In another terminal, open a few files (cat /etc/hostname, ls /tmp). Observe the output showing PID, process name, and file paths. Identify which processes are opening files in the background without your input.
Count syscalls by process. Run syscount-bpfcc -d 10 to count syscalls over 10 seconds. While it runs, generate some activity (browse files, run commands). Review the output and identify which process made the most syscalls and which syscall was most frequent. Compare this to what strace -c -p PID would show for a single process.
Trace TCP connections. Run tcpconnect-bpfcc in one terminal. In another, make several outbound connections: curl https://example.com, dig @8.8.8.8 example.com. Observe the source address, destination address, and port for each connection. Identify any unexpected outbound connections from background processes.
Measure block I/O latency. Run biolatency-bpfcc -d 10 to collect a histogram of disk I/O latency over 10 seconds. While it runs, generate some disk activity (find / -name "*.conf" 2>/dev/null). Review the histogram output and identify whether any I/O operations fell in the high-latency tail (>1ms).
Compare eBPF overhead to strace. Pick a busy process (e.g., start find / -type f 2>/dev/null). Time it without tracing. Time it under strace -c. Then run the same workload while execsnoop-bpfcc runs in another terminal. Compare the wall-clock times to see the overhead difference between strace and eBPF-based tracing.

strace: Reading the Matrix — the traditional (slower) approach to syscall tracing
The Mysterious Latency Spike — eBPF tools for latency diagnosis
The Container Escape — eBPF for runtime security enforcement