Runtime Security with Falco — Street-Level Ops¶
Real-world workflows for deploying Falco, tuning rules, routing alerts, and investigating runtime security events.
Quick Diagnosis Commands¶
# Is Falco running on all nodes?
kubectl get pods -n falco -l app.kubernetes.io/name=falco -o wide
# One pod per node expected (DaemonSet)
# Is Falco generating any alerts right now?
kubectl logs -n falco -l app.kubernetes.io/name=falco --tail=50 | jq .
# Count alerts by rule in the last hour
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq -r '.rule' | sort | uniq -c | sort -rn | head -20
# Count alerts by priority
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq -r '.priority' | sort | uniq -c | sort -rn
# Check which Falco driver is loaded
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
falco --version
# Shows: driver.modern_ebpf, driver.ebpf, or driver.kmod
Default trap: The
modern_ebpfdriver requires kernel >= 5.8 with BTF support. If your nodes run an older kernel, Falco silently falls back to the legacy eBPF probe or kmod, which requires a probe rebuild on every kernel upgrade. Always verify the active driver after deployment -- do not assume modern_ebpf is running just because you requested it in Helm values.
# Check for dropped events (non-zero = kernel ring buffer overflow)
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq 'select(.rule == "Falco internal: syscall event drop")'
# Validate a rules file without restarting
falco --validate /etc/falco/my-custom-rules.yaml
# Check Falco's syscall event source stats (drop rate, preemptions)
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
curl -s localhost:8765/healthz | jq .
# List all loaded rules
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
falco --list
Gotcha: Alert Storm from a Noisy Built-In Rule¶
Rule: The built-in rules are aggressive defaults. Write below root and Read sensitive file untrusted will fire for many legitimate processes (Vault agent, init containers, package managers). Tune before deploying to production — do not disable, override.
# 1. Find the noisiest rules
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
jq -r '.rule' | sort | uniq -c | sort -rn | head -10
# Example output:
# 4823 Read sensitive file untrusted
# 312 Write below root
# 89 Contact K8S API Server From Container
# 2. Identify what processes are triggering the noisy rule
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq 'select(.rule == "Read sensitive file untrusted") | .output_fields.proc.name' | \
sort | uniq -c | sort -rn
# → vault-agent (3421), consul-template (892) → legitimate
# 3. Override in falco_rules.local.yaml (never edit the base file)
cat >> /etc/falco/falco_rules.local.yaml << 'EOF'
- macro: trusted_programs_reading_sensitive_files
condition: >
(proc.name in (trusted_programs_reading_sensitive_files)
or proc.name in (vault-agent, consul-template, datadog-agent))
append: true
EOF
# 4. Reload rules (hot reload — no pod restart needed since Falco 0.32)
# append: true merges your condition with the existing macro — do NOT redefine it
kill -HUP $(pgrep falco)
# Or in Kubernetes:
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
kill -HUP 1
Gotcha: Falco Probe Fails to Load After Kernel Upgrade¶
Rule: The eBPF probe must be compatible with the running kernel. After a node kernel upgrade, Falco may fail to start until the probe is rebuilt or the pre-built probe for the new kernel version is downloaded.
# Symptom in Falco logs:
# CRIT Failed to load BPF probe: incompatible kernel version
# Check running kernel version
kubectl get node -o wide | grep kernel
uname -r # on the node
# If using modern_ebpf (BPF CO-RE), this is usually self-resolving
# If using legacy ebpf probe, Falco downloads it from download.falco.org
# The download requires outbound internet access on the node
# For air-gapped environments — pre-build the probe:
# On a machine with matching kernel:
docker run --rm \
-v /root:/root \
-e FALCO_VERSION=0.37.0 \
-e DRIVER_VERSION=6.0.1+driver \
falcosecurity/falco-driver-loader:latest
# Copy the .ko or .o file to your nodes
# Set DRIVERS_REPO to point to your internal artifact store in Helm values
# Switch to modern_ebpf to avoid this problem entirely (kernel >= 5.8):
helm upgrade falco falcosecurity/falco \
--namespace falco \
--set driver.kind=modern_ebpf \
--reuse-values
Pattern: Investigating a Specific Falco Alert¶
A Falco alert fires: Shell spawned in container on the payments pod in production.
# 1. Get the full alert JSON
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq 'select(.rule == "Terminal shell in container" and .output_fields."k8s.pod.name" == "payments-xxx")'
# Output:
# {
# "rule": "Terminal shell in container",
# "priority": "NOTICE",
# "time": "2024-03-15T14:32:01.234Z",
# "output_fields": {
# "container.id": "abc123",
# "container.name": "payments",
# "k8s.pod.name": "payments-xxx-yyy",
# "k8s.ns.name": "production",
# "proc.name": "bash",
# "proc.pname": "kubectl",
# "proc.cmdline": "bash",
# "user.name": "root",
# "proc.tty": 34816
# }
# }
# 2. Cross-reference with Kubernetes audit logs
# Who ran kubectl exec?
jq 'select(.verb == "create" and .objectRef.subresource == "exec" and .objectRef.name == "payments-xxx-yyy")' \
/var/log/kubernetes/audit.log | jq '{user: .user.username, time: .requestReceivedTimestamp}'
# 3. Check what the shell did (if Falco has syscall-level rules)
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq 'select(.output_fields."container.id" == "abc123") | {rule: .rule, time: .time, cmd: .output_fields."proc.cmdline"}'
# 4. Capture container state for forensics (if container still running)
kubectl exec -n production payments-xxx-yyy -- ps aux
kubectl exec -n production payments-xxx-yyy -- netstat -an
kubectl exec -n production payments-xxx-yyy -- find /tmp -newer /tmp -ls 2>/dev/null
Pattern: Writing a Custom Rule for Your Environment¶
Your environment policy: no container should ever write to /etc/hosts (suspicious — could indicate DNS poisoning or host file manipulation).
# falco_rules.local.yaml
- list: allowed_etc_writers
items: [] # no process is allowed
- rule: Write to /etc/hosts in container
desc: >
A process inside a container wrote to /etc/hosts. This is unusual and may
indicate an attacker attempting to redirect DNS resolution within the container
or the pod network.
condition: >
open_write
and container
and fd.name = /etc/hosts
and not proc.name in (allowed_etc_writers)
output: >
Write to /etc/hosts in container
(user=%user.name
command=%proc.cmdline
file=%fd.name
container=%container.name
image=%container.image.repository:%container.image.tag
k8s.pod=%k8s.pod.name
k8s.ns=%k8s.ns.name)
priority: WARNING
tags: [container, filesystem, dns, mitre_defense_evasion]
# Test the rule fires (run this in a test namespace, not production)
kubectl run test-write --image=alpine --rm -it --restart=Never -- \
sh -c "echo '1.2.3.4 evil.example.com' >> /etc/hosts"
# Check Falco logs for the alert
kubectl logs -n falco -l app.kubernetes.io/name=falco --tail=10 | \
jq 'select(.rule == "Write to /etc/hosts in container")'
Pattern: Routing Alerts by Priority to Different Channels¶
# falcosidekick-values.yaml for Helm
falcosidekick:
config:
# CRITICAL and ALERT → PagerDuty (immediate on-call)
pagerduty:
routingKey: "your-pd-routing-key"
minimumpriority: "critical"
# WARNING and above → Slack #security-alerts
slack:
webhookurl: "https://hooks.slack.com/services/..."
channel: "#security-alerts"
minimumpriority: "warning"
# Only include these fields in Slack message
messageformat: >-
*Rule*: `{{ .Rule }}` | *Priority*: {{ .Priority }}
*Container*: `{{ index .OutputFields "container.name" }}`
*Pod*: `{{ index .OutputFields "k8s.pod.name" }}` in `{{ index .OutputFields "k8s.ns.name" }}`
*Command*: `{{ index .OutputFields "proc.cmdline" }}`
# Everything DEBUG and above → Loki for historical search
loki:
hostport: "http://loki.monitoring.svc.cluster.local:3100"
minimumpriority: "debug"
tenant: "falco"
# CRITICAL → OpsGenie for backup paging
opsgenie:
apikey: "your-opsgenie-key"
region: "us"
minimumpriority: "critical"
Scenario: Falcosidekick UI Investigation¶
# Port-forward the Falcosidekick UI
kubectl port-forward -n falco svc/falco-falcosidekick-ui 2802:2802
# Open: http://localhost:2802
# In the UI you can:
# - View real-time alert stream
# - Filter by rule, priority, namespace, pod
# - See alert trend over time
# - Search historical alerts (backed by the configured outputs)
# Via API (Falcosidekick health and stats)
kubectl port-forward -n falco svc/falco-falcosidekick 2801:2801
curl http://localhost:2801/ping
# {"status": "ok"}
curl http://localhost:2801/stats | jq .
# Shows alert counts per output, per priority
curl http://localhost:2801/metrics | grep falcosidekick_events
# Prometheus metrics for Falcosidekick itself
Emergency: Cryptomining Process Detected at Runtime¶
Falco fires: Crypto mining outbound connection from a container in the batch namespace.
# 1. Immediately isolate the pod (remove from Service endpoints)
kubectl label pod suspicious-batch-xxx quarantine=true -n batch
kubectl patch svc batch-svc -n batch --type='json' \
-p='[{"op":"add","path":"/spec/selector/quarantine","value":"false"}]'
# 2. Capture forensic state BEFORE terminating
POD=suspicious-batch-xxx
NS=batch
# Process list
kubectl exec -n $NS $POD -- ps auxww > /tmp/forensics-ps.txt 2>&1
# Network connections
kubectl exec -n $NS $POD -- ss -antp > /tmp/forensics-netstat.txt 2>&1
# Open files
kubectl exec -n $NS $POD -- ls -la /proc/1/fd > /tmp/forensics-fds.txt 2>&1
# Files in /tmp
kubectl exec -n $NS $POD -- find /tmp -ls > /tmp/forensics-tmp.txt 2>&1
# Container image layers
kubectl get pod $POD -n $NS -o jsonpath='{.spec.containers[0].image}'
# 3. Get the full Falco alert history for this container
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
jq --arg pod "$POD" 'select(.output_fields."k8s.pod.name" == $pod)'
# 4. Check what image this pod is running and where it came from
kubectl get pod $POD -n $NS -o yaml | grep -E "image:|imagePullPolicy|serviceAccountName"
# 5. Terminate the pod
kubectl delete pod $POD -n $NS --grace-period=0 --force
# 6. Check if other pods in the namespace are affected
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
jq --arg ns "$NS" 'select(.output_fields."k8s.ns.name" == $ns) | {rule: .rule, pod: .output_fields."k8s.pod.name"}'
# 7. Scan the image that was running for known malware
# Run trivy BEFORE deleting the pod — once the pod is gone, the image may be garbage-collected
trivy image --severity CRITICAL,HIGH $(kubectl get pod $POD -n $NS -o jsonpath='{.spec.containers[0].image}' 2>/dev/null || echo "unknown")
Useful One-Liners¶
# Watch alerts in real time (formatted)
kubectl logs -n falco -l app.kubernetes.io/name=falco -f | \
jq -r '[.time, .priority, .rule, .output_fields."k8s.pod.name"] | @csv'
# Count CRITICAL alerts in last 24h
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
jq 'select(.priority == "Critical")' | wc -l
# List all unique rule names firing in the last hour
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq -r '.rule' | sort -u
# Filter alerts by namespace
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=1h | \
jq 'select(.output_fields."k8s.ns.name" == "production")'
# Get all alerts for a specific pod
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
jq 'select(.output_fields."k8s.pod.name" | startswith("myapp-"))'
# Reload Falco rules hot (no pod restart)
kubectl exec -n falco $(kubectl get pod -n falco -l app.kubernetes.io/name=falco -o name | head -1) -- \
kill -HUP 1
# Test rule syntax before deploying
falco --validate /path/to/my-rules.yaml && echo "Rules valid"
# Check Falcosidekick is receiving events
kubectl port-forward -n falco svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/stats | jq '.outputs | to_entries[] | {output: .key, events: .value.sent}'