Runtime Security with Falco — Street-Level Ops¶

Real-world workflows for deploying Falco, tuning rules, routing alerts, and investigating runtime security events.

Quick Diagnosis Commands¶

# Is Falco running on all nodes?
kubectl get pods -n falco -l app.kubernetes.io/name=falco -o wide
# One pod per node expected (DaemonSet)

# Is Falco generating any alerts right now?
kubectl logs -n falco -l app.kubernetes.io/name=falco --tail=50 | jq .

# Count alerts by rule in the last hour
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq -r '.rule' | sort | uniq -c | sort -rn | head -20

# Count alerts by priority
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq -r '.priority' | sort | uniq -c | sort -rn

# Check which Falco driver is loaded
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
  falco --version
# Shows: driver.modern_ebpf, driver.ebpf, or driver.kmod

Default trap: The modern_ebpf driver requires kernel >= 5.8 with BTF support. If your nodes run an older kernel, Falco silently falls back to the legacy eBPF probe or kmod, which requires a probe rebuild on every kernel upgrade. Always verify the active driver after deployment -- do not assume modern_ebpf is running just because you requested it in Helm values.

# Check for dropped events (non-zero = kernel ring buffer overflow)
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq 'select(.rule == "Falco internal: syscall event drop")'

# Validate a rules file without restarting
falco --validate /etc/falco/my-custom-rules.yaml

# Check Falco's syscall event source stats (drop rate, preemptions)
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
  curl -s localhost:8765/healthz | jq .

# List all loaded rules
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
  falco --list

Gotcha: Alert Storm from a Noisy Built-In Rule¶

Rule: The built-in rules are aggressive defaults. Write below root and Read sensitive file untrusted will fire for many legitimate processes (Vault agent, init containers, package managers). Tune before deploying to production — do not disable, override.

# 1. Find the noisiest rules
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
  jq -r '.rule' | sort | uniq -c | sort -rn | head -10

# Example output:
# 4823  Read sensitive file untrusted
#  312  Write below root
#   89  Contact K8S API Server From Container

# 2. Identify what processes are triggering the noisy rule
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq 'select(.rule == "Read sensitive file untrusted") | .output_fields.proc.name' | \
  sort | uniq -c | sort -rn
# → vault-agent (3421), consul-template (892) → legitimate

# 3. Override in falco_rules.local.yaml (never edit the base file)
cat >> /etc/falco/falco_rules.local.yaml << 'EOF'
- macro: trusted_programs_reading_sensitive_files
  condition: >
    (proc.name in (trusted_programs_reading_sensitive_files)
     or proc.name in (vault-agent, consul-template, datadog-agent))
  append: true
EOF

# 4. Reload rules (hot reload — no pod restart needed since Falco 0.32)
#    append: true merges your condition with the existing macro — do NOT redefine it
kill -HUP $(pgrep falco)
# Or in Kubernetes:
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
  kill -HUP 1

Gotcha: Falco Probe Fails to Load After Kernel Upgrade¶

Rule: The eBPF probe must be compatible with the running kernel. After a node kernel upgrade, Falco may fail to start until the probe is rebuilt or the pre-built probe for the new kernel version is downloaded.

# Symptom in Falco logs:
# CRIT Failed to load BPF probe: incompatible kernel version

# Check running kernel version
kubectl get node -o wide | grep kernel
uname -r   # on the node

# If using modern_ebpf (BPF CO-RE), this is usually self-resolving
# If using legacy ebpf probe, Falco downloads it from download.falco.org
# The download requires outbound internet access on the node

# For air-gapped environments — pre-build the probe:
# On a machine with matching kernel:
docker run --rm \
  -v /root:/root \
  -e FALCO_VERSION=0.37.0 \
  -e DRIVER_VERSION=6.0.1+driver \
  falcosecurity/falco-driver-loader:latest

# Copy the .ko or .o file to your nodes
# Set DRIVERS_REPO to point to your internal artifact store in Helm values

# Switch to modern_ebpf to avoid this problem entirely (kernel >= 5.8):
helm upgrade falco falcosecurity/falco \
  --namespace falco \
  --set driver.kind=modern_ebpf \
  --reuse-values

Pattern: Investigating a Specific Falco Alert¶

A Falco alert fires: Shell spawned in container on the payments pod in production.

# 1. Get the full alert JSON
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq 'select(.rule == "Terminal shell in container" and .output_fields."k8s.pod.name" == "payments-xxx")'

# Output:
# {
#   "rule": "Terminal shell in container",
#   "priority": "NOTICE",
#   "time": "2024-03-15T14:32:01.234Z",
#   "output_fields": {
#     "container.id": "abc123",
#     "container.name": "payments",
#     "k8s.pod.name": "payments-xxx-yyy",
#     "k8s.ns.name": "production",
#     "proc.name": "bash",
#     "proc.pname": "kubectl",
#     "proc.cmdline": "bash",
#     "user.name": "root",
#     "proc.tty": 34816
#   }
# }

# 2. Cross-reference with Kubernetes audit logs
# Who ran kubectl exec?
jq 'select(.verb == "create" and .objectRef.subresource == "exec" and .objectRef.name == "payments-xxx-yyy")' \
  /var/log/kubernetes/audit.log | jq '{user: .user.username, time: .requestReceivedTimestamp}'

# 3. Check what the shell did (if Falco has syscall-level rules)
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq 'select(.output_fields."container.id" == "abc123") | {rule: .rule, time: .time, cmd: .output_fields."proc.cmdline"}'

# 4. Capture container state for forensics (if container still running)
kubectl exec -n production payments-xxx-yyy -- ps aux
kubectl exec -n production payments-xxx-yyy -- netstat -an
kubectl exec -n production payments-xxx-yyy -- find /tmp -newer /tmp -ls 2>/dev/null

Pattern: Writing a Custom Rule for Your Environment¶

Your environment policy: no container should ever write to /etc/hosts (suspicious — could indicate DNS poisoning or host file manipulation).

# falco_rules.local.yaml

- list: allowed_etc_writers
  items: []   # no process is allowed

- rule: Write to /etc/hosts in container
  desc: >
    A process inside a container wrote to /etc/hosts. This is unusual and may
    indicate an attacker attempting to redirect DNS resolution within the container
    or the pod network.
  condition: >
    open_write
    and container
    and fd.name = /etc/hosts
    and not proc.name in (allowed_etc_writers)
  output: >
    Write to /etc/hosts in container
    (user=%user.name
     command=%proc.cmdline
     file=%fd.name
     container=%container.name
     image=%container.image.repository:%container.image.tag
     k8s.pod=%k8s.pod.name
     k8s.ns=%k8s.ns.name)
  priority: WARNING
  tags: [container, filesystem, dns, mitre_defense_evasion]

# Test the rule fires (run this in a test namespace, not production)
kubectl run test-write --image=alpine --rm -it --restart=Never -- \
  sh -c "echo '1.2.3.4 evil.example.com' >> /etc/hosts"

# Check Falco logs for the alert
kubectl logs -n falco -l app.kubernetes.io/name=falco --tail=10 | \
  jq 'select(.rule == "Write to /etc/hosts in container")'

Pattern: Routing Alerts by Priority to Different Channels¶

# falcosidekick-values.yaml for Helm

falcosidekick:
  config:
    # CRITICAL and ALERT → PagerDuty (immediate on-call)
    pagerduty:
      routingKey: "your-pd-routing-key"
      minimumpriority: "critical"

    # WARNING and above → Slack #security-alerts
    slack:
      webhookurl: "https://hooks.slack.com/services/..."
      channel: "#security-alerts"
      minimumpriority: "warning"
      # Only include these fields in Slack message
      messageformat: >-
        *Rule*: `{{ .Rule }}` | *Priority*: {{ .Priority }}
        *Container*: `{{ index .OutputFields "container.name" }}`
        *Pod*: `{{ index .OutputFields "k8s.pod.name" }}` in `{{ index .OutputFields "k8s.ns.name" }}`
        *Command*: `{{ index .OutputFields "proc.cmdline" }}`

    # Everything DEBUG and above → Loki for historical search
    loki:
      hostport: "http://loki.monitoring.svc.cluster.local:3100"
      minimumpriority: "debug"
      tenant: "falco"

    # CRITICAL → OpsGenie for backup paging
    opsgenie:
      apikey: "your-opsgenie-key"
      region: "us"
      minimumpriority: "critical"

Scenario: Falcosidekick UI Investigation¶

# Port-forward the Falcosidekick UI
kubectl port-forward -n falco svc/falco-falcosidekick-ui 2802:2802

# Open: http://localhost:2802

# In the UI you can:
# - View real-time alert stream
# - Filter by rule, priority, namespace, pod
# - See alert trend over time
# - Search historical alerts (backed by the configured outputs)

# Via API (Falcosidekick health and stats)
kubectl port-forward -n falco svc/falco-falcosidekick 2801:2801

curl http://localhost:2801/ping
# {"status": "ok"}

curl http://localhost:2801/stats | jq .
# Shows alert counts per output, per priority

curl http://localhost:2801/metrics | grep falcosidekick_events
# Prometheus metrics for Falcosidekick itself

Emergency: Cryptomining Process Detected at Runtime¶

Falco fires: Crypto mining outbound connection from a container in the batch namespace.

# 1. Immediately isolate the pod (remove from Service endpoints)
kubectl label pod suspicious-batch-xxx quarantine=true -n batch
kubectl patch svc batch-svc -n batch --type='json' \
  -p='[{"op":"add","path":"/spec/selector/quarantine","value":"false"}]'

# 2. Capture forensic state BEFORE terminating
POD=suspicious-batch-xxx
NS=batch

# Process list
kubectl exec -n $NS $POD -- ps auxww > /tmp/forensics-ps.txt 2>&1

# Network connections
kubectl exec -n $NS $POD -- ss -antp > /tmp/forensics-netstat.txt 2>&1

# Open files
kubectl exec -n $NS $POD -- ls -la /proc/1/fd > /tmp/forensics-fds.txt 2>&1

# Files in /tmp
kubectl exec -n $NS $POD -- find /tmp -ls > /tmp/forensics-tmp.txt 2>&1

# Container image layers
kubectl get pod $POD -n $NS -o jsonpath='{.spec.containers[0].image}'

# 3. Get the full Falco alert history for this container
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
  jq --arg pod "$POD" 'select(.output_fields."k8s.pod.name" == $pod)'

# 4. Check what image this pod is running and where it came from
kubectl get pod $POD -n $NS -o yaml | grep -E "image:|imagePullPolicy|serviceAccountName"

# 5. Terminate the pod
kubectl delete pod $POD -n $NS --grace-period=0 --force

# 6. Check if other pods in the namespace are affected
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
  jq --arg ns "$NS" 'select(.output_fields."k8s.ns.name" == $ns) | {rule: .rule, pod: .output_fields."k8s.pod.name"}'

# 7. Scan the image that was running for known malware
#    Run trivy BEFORE deleting the pod — once the pod is gone, the image may be garbage-collected
trivy image --severity CRITICAL,HIGH $(kubectl get pod $POD -n $NS -o jsonpath='{.spec.containers[0].image}' 2>/dev/null || echo "unknown")

Useful One-Liners¶

# Watch alerts in real time (formatted)
kubectl logs -n falco -l app.kubernetes.io/name=falco -f | \
  jq -r '[.time, .priority, .rule, .output_fields."k8s.pod.name"] | @csv'

# Count CRITICAL alerts in last 24h
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
  jq 'select(.priority == "Critical")' | wc -l

# List all unique rule names firing in the last hour
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq -r '.rule' | sort -u

# Filter alerts by namespace
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
  jq 'select(.output_fields."k8s.ns.name" == "production")'

# Get all alerts for a specific pod
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
  jq 'select(.output_fields."k8s.pod.name" | startswith("myapp-"))'

# Reload Falco rules hot (no pod restart)
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
  kill -HUP 1

# Test rule syntax before deploying
falco --validate /path/to/my-rules.yaml && echo "Rules valid"

# Check Falcosidekick is receiving events
kubectl port-forward -n falco svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/stats | jq '.outputs | to_entries[] | {output: .key, events: .value.sent}'