eBPF & Modern Linux Observability - Street-Level Ops¶

What experienced Linux performance engineers know about using eBPF tools in production without causing more problems than you're solving.

Quick Diagnosis Commands¶

# Verify eBPF support on this kernel
uname -r                                          # Need 4.9+ minimum, 5.8+ ideal
cat /boot/config-$(uname -r) | grep CONFIG_BPF    # Should show CONFIG_BPF=y
bpftool feature probe kernel 2>/dev/null | head -20

> **One-liner:** Kernel 5.8 is the magic number for eBPF. Before 5.8, you need the overpowered `CAP_SYS_ADMIN`. After 5.8, the dedicated `CAP_BPF` and `CAP_PERFMON` capabilities give you tracing without full root.

# Check if BCC tools are installed
dpkg -l | grep bpfcc 2>/dev/null || rpm -qa | grep bcc-tools 2>/dev/null

# Quick process execution audit — what's spawning right now?
timeout 10 execsnoop-bpfcc 2>/dev/null || timeout 10 /usr/share/bcc/tools/execsnoop

# Quick network connection audit — who's connecting where?
timeout 10 tcpconnect-bpfcc 2>/dev/null || timeout 10 /usr/share/bcc/tools/tcpconnect

# Quick disk latency check — is I/O slow?
timeout 5 biolatency-bpfcc 2>/dev/null || timeout 5 /usr/share/bcc/tools/biolatency

# Check scheduler pressure — are processes waiting for CPU?
timeout 5 runqlat-bpfcc 2>/dev/null || timeout 5 /usr/share/bcc/tools/runqlat

# Verify kernel headers are installed (required for BCC)
ls /lib/modules/$(uname -r)/build/ >/dev/null 2>&1 && echo "Headers OK" || echo "MISSING HEADERS"

Debug clue: If biolatency or runqlat starts but shows zero output on a busy system, the kernel may have CONFIG_BPF_JIT=n. Check with sysctl net.core.bpf_jit_enable — a value of 0 means BPF programs run in the interpreter, which is slower and may silently drop high-frequency events.

Gotcha: BCC Tool Fails With "Failed to compile BPF module"¶

You install bcc-tools and run execsnoop. It immediately fails with a compilation error. The kernel headers don't match the running kernel, or they're missing entirely.

Fix:

# Install matching kernel headers
apt install -y linux-headers-$(uname -r)    # Debian/Ubuntu
dnf install -y kernel-devel-$(uname -r)      # RHEL/CentOS

# If the exact version isn't available (common after kernel updates without reboot)
# Option 1: Reboot into the installed kernel
# Option 2: Install headers for the installed kernel version
apt list --installed 2>/dev/null | grep linux-image
dnf list installed kernel

# For container hosts: headers must be on the HOST, not in the container
# Mount /lib/modules and /usr/src from host if running BCC in a container

Gotcha: eBPF Tool Says "Operation not permitted"¶

You're running as a regular user or inside a container with restricted capabilities. eBPF requires CAP_BPF (kernel 5.8+) or CAP_SYS_ADMIN (older kernels).

Fix:

# Run as root (simplest)
sudo execsnoop-bpfcc

# Or grant specific capabilities (better for non-interactive use)
sudo setcap cap_bpf,cap_perfmon+ep /usr/sbin/bpftrace

# In a container: add required capabilities
# docker run --cap-add=SYS_ADMIN --cap-add=BPF ...
# Or use --privileged (not recommended for production)

# In Kubernetes: add to securityContext
# securityContext:
#   capabilities:
#     add: ["BPF", "PERFMON", "SYS_ADMIN"]

Gotcha: Tracing Floods the Terminal and Eats CPU¶

You run opensnoop on a busy production server. It traces every file open across all processes. Your terminal fills with thousands of lines per second. The tracing itself starts consuming noticeable CPU.

Fix: Always filter your traces:

# Filter by process name
opensnoop-bpfcc -n nginx

# Filter by PID
opensnoop-bpfcc -p 14500

# Filter by specific file path
opensnoop-bpfcc -f /etc/nginx/

# Limit duration
timeout 30 opensnoop-bpfcc -n nginx

# For bpftrace: always filter in the probe
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "nginx"/ {
  printf("%s\n", str(args->filename));
}'
# NOT: tracepoint:syscalls:sys_enter_openat { ... }  (traces everything)

Gotcha: biolatency Shows Fast I/O But Application Is Slow¶

biolatency shows all block I/O under 1ms. But your application's read latency is 50ms. The block device is fast, but the filesystem layer is adding latency — maybe due to extent fragmentation, journal commits, or metadata lookups.

Fix: Use filesystem-level tracers instead of block-level:

# Trace slow ext4 operations (> 1ms threshold)
ext4slower-bpfcc 1

# Trace slow XFS operations
xfsslower-bpfcc 1

# Trace slow NFS operations (if using NFS)
nfsslower-bpfcc 1

# These trace at the VFS/filesystem layer, not the block layer
# They catch issues that biolatency misses

Gotcha: tcpretrans Shows Retransmissions But Network Is "Fine"¶

Network team says the network is fine. But tcpretrans shows steady retransmissions. The issue is application-level: the receiving application is slow to read from the socket, causing the receive buffer to fill, leading to TCP window zero and eventual retransmissions.

Fix: Correlate retransmissions with application-level metrics:

# Check TCP buffer utilization
ss -tnp | awk '{print $2, $3, $5, $6}' | sort -rn | head -20
# Column 1: Recv-Q (bytes queued for application to read)
# If Recv-Q is consistently high, the application is slow to consume data

# Trace socket buffer usage for a specific process
bpftrace -e 'kprobe:tcp_recvmsg {
  $sk = (struct sock *)arg0;
  printf("PID %d recv_q: %d\n", pid, $sk->sk_receive_queue.qlen);
}' -p $(pgrep myapp)

Pattern: The Performance Triage Sequence¶

When you don't know what's slow, follow this order:

Step 1: CPU scheduling (runqlat)
  → High run queue latency? CPUs overloaded.
  → Find the hog: bpftrace -e 'profile:hz:99 { @[comm] = count(); }'

Step 2: Disk I/O (biolatency)
  → High latency (>10ms for SSD, >50ms for HDD)? Storage problem.
  → Find the cause: biosnoop to see specific I/O operations

Step 3: Network (tcpretrans + tcplife)
  → Retransmissions? Network quality issue.
  → Short-lived connections with no data? Connection churn.

Step 4: Filesystem (ext4slower / xfsslower)
  → Block I/O is fast but file ops are slow? Filesystem layer issue.

Step 5: Application (custom bpftrace)
  → Trace specific syscalls, lock contention, or user-space functions
  → This is where you go deep on the specific application

Pattern: Building a Production eBPF Runbook¶

Create runbook entries for your top 5 performance scenarios:

Scenario: High API latency
  Tool: runqlat → biolatency → tcpretrans
  Expected output: [paste baseline histograms]
  Abnormal indicators: runqlat > 1ms, biolatency > 10ms, retransmits > 10/min
  Escalation: if runqlat shows CPU saturation, check for noisy neighbors

Scenario: Connection exhaustion
  Tool: tcplife → tcpconnect → ss -s
  Expected output: connections lasting < 5s, < 100 concurrent
  Abnormal indicators: connections > 300s, concurrent > 1000
  Escalation: connection pool leak, contact application team

Scenario: Disk space growing unexpectedly
  Tool: opensnoop → filetop → biosnoop
  Expected output: known log files being written
  Abnormal indicators: unknown processes writing, unexpected paths
  Escalation: if process is unfamiliar, possible compromise

Pattern: eBPF for Security Monitoring¶

# Detect unexpected privilege escalation
bpftrace -e 'tracepoint:syscalls:sys_enter_setuid /args->uid == 0/ {
  printf("ALERT: %s (PID %d) escalated to root\n", comm, pid);
}'

# Detect unexpected outbound connections
tcpconnect-bpfcc | grep -v -E '(10\.0\.|192\.168\.|127\.0\.)'
# Shows connections to non-RFC1918 addresses — unexpected external traffic

# Detect unexpected process execution
execsnoop-bpfcc | grep -v -E '(bash|python|node|nginx|postgres)'
# Shows process executions outside your expected process list

# Detect file access to sensitive paths
opensnoop-bpfcc | grep -E '(/etc/shadow|/etc/passwd|\.ssh/|\.env)'
# Alert on access to credential files

Emergency: Production Server Suddenly Slow — No Obvious Cause¶

CPU at 30%, memory fine, disk space fine. But everything is slow. No recent deploys.

1. Run queue latency (is something hogging CPU in bursts?)
   runqlat-bpfcc -m     # milliseconds
   → Look for a bimodal distribution: most fast, some very slow
   → If so: something is monopolizing a CPU core periodically

2. Find the CPU hog
   bpftrace -e 'profile:hz:99 { @[comm] = count(); }' -d 10
   → Top process by on-CPU samples in 10 seconds

3. Check for I/O wait hiding in averages
   biosnoop-bpfcc | head -50
   → Look for individual I/O operations with high latency
   → NFS/CIFS mounts are a common hidden cause

4. Check for network retransmissions
   tcpretrans-bpfcc
   → Retransmissions add latency without showing up in CPU/memory

5. Check for lock contention (advanced)
   bpftrace -e 'tracepoint:lock:contention_begin { @[kstack] = count(); }'
   → Kernel lock contention can cause system-wide slowdowns

Most common root causes at this stage:
  - Noisy neighbor (in VM or container environment)
  - NFS mount gone slow (check: nfsslower 1)
  - Kernel bug / regression after unnoticed update
  - Transparent Huge Pages compaction (check: grep compaction /proc/vmstat)

Emergency: Need eBPF But Tools Aren't Installed¶

Production server is having issues. You need eBPF tools. Nothing is installed. You can't apt/dnf install because the server has no internet access (airgapped or restricted network).

Option 1: bpftrace is a single binary
  - Download from https://github.com/iovisor/bpftrace/releases
  - scp it to the server: scp bpftrace-static prod-server:/tmp/
  - chmod +x /tmp/bpftrace && /tmp/bpftrace -e 'BEGIN { printf("works\n"); }'

Option 2: Use built-in tracing (no eBPF tools needed)
  - /proc is always available:
    cat /proc/$(pgrep myapp)/status          # Process memory and state
    cat /proc/$(pgrep myapp)/io              # Process I/O stats
    cat /proc/$(pgrep myapp)/fd | wc -l      # File descriptor count
    cat /proc/$(pgrep myapp)/net/tcp          # TCP connections

  - perf is often pre-installed:
    perf top                                  # Live CPU profiling
    perf stat -p $(pgrep myapp) sleep 10      # Process-level counters

  - ftrace is in every kernel (no tools needed):
    echo function > /sys/kernel/debug/tracing/current_tracer
    echo myapp_func > /sys/kernel/debug/tracing/set_ftrace_filter
    cat /sys/kernel/debug/tracing/trace_pipe

Option 3: Pre-stage tools in your base image
  - Include bcc-tools and bpftrace in your server provisioning
  - If you need them once, you'll need them again

> **War story:** During a production incident, an SRE spent 40 minutes trying to install bcc-tools on an airgapped server while the outage continued. The static bpftrace binary was 25 MB and could have been scp'd in 10 seconds. After the incident, the team added bpftrace to their base AMI and never needed it urgently again — until six months later, when they did.

Quick Reference¶

Deep Dive: Ebpf Explained