Runtime Security with Falco Footguns¶

Mistakes that cause missed detections, alert storms, probe failures, and false confidence in runtime security coverage.

1. Deploying Default Rules to Production Without Tuning¶

You deploy Falco with default rules. Within minutes, your Slack channel receives 4,000 alerts. Read sensitive file untrusted fires for Vault agent, consul-template, and datadog-agent — all legitimate. Engineers start ignoring the channel. Within a week, all Falco alerts are muted because "it cries wolf." A real intrusion goes unnoticed for six days.

Fix: Run Falco in count mode for 24 hours before routing to alerting systems. Identify the top 10 noisy rules, add legitimate callers to the appropriate lists or macros in falco_rules.local.yaml, and only then enable alerting. Alert fatigue is the primary way runtime security is defeated.

War story: The Target breach post-mortem (2013) revealed that security alerts for the intrusion were generated but ignored because the security team was overwhelmed by thousands of daily false positives. The same pattern applies to Falco — if your team learns to ignore the Slack channel because of noise, a real Container escape via privileged mount alert will scroll past unnoticed. The first 24-48 hours of tuning before enabling alerts is non-negotiable.

# Count alerts by rule for 24h before enabling alerting
kubectl logs -n falco -l app.kubernetes.io/name=falco --since=24h | \
  jq -r '.rule' | sort | uniq -c | sort -rn | head -20

2. Editing the Base Rules File Instead of Creating Local Overrides¶

You find a rule that fires too noisily and edit /etc/falco/falco_rules.yaml directly — either on the node or inside a ConfigMap. The next Helm upgrade overwrites your changes. The noisy rules return. Your tuning is lost.

Fix: Never edit falco_rules.yaml. All customization goes in falco_rules.local.yaml (which survives upgrades) using append: true to add to existing lists and macros, or by overriding rule conditions. This file is explicitly loaded after the base rules and takes precedence.

# falco_rules.local.yaml — safe, upgrade-proof overrides

# Extend an existing list
- list: allowed_shell_containers
  items: [my-debug-pod, vault-agent-init]
  append: true

# Extend an existing macro
- macro: trusted_programs_reading_sensitive_files
  condition: or proc.name in (vault-agent, consul-template)
  append: true

3. Kernel Module Breaks on Node Kernel Upgrade¶

Your cluster runs Falco with the kernel module (.ko) driver. A node kernel upgrade rolls out via automatic OS patching. The existing .ko file is incompatible with the new kernel. Falco on that node fails to start. You have zero runtime visibility on those nodes — but the DaemonSet still shows as "Running" because the pod is up even though the driver isn't loaded.

Fix: Switch to the eBPF modern probe (driver.kind: modern_ebpf) on kernels ≥ 5.8. The modern probe uses BPF CO-RE (Compile Once, Run Everywhere) and does not need to be rebuilt per kernel version. Also: add a Falco liveness probe that checks the driver is actually loaded:

# Verify the driver is loaded on a node
kubectl exec -n falco <falco-pod> -- falco --version
# If driver fails to load, this exits non-zero

# Check for probe-related errors
kubectl logs -n falco <falco-pod> | grep -i "probe\|driver\|bpf\|kmod"

4. Syscall Event Drops — Silent Gaps in Coverage¶

Falco's kernel ring buffer fills faster than the userspace engine can drain it. Events are silently dropped. The Falco log shows "syscall event drop" warnings but you have them configured to just log, not alert. You have blind spots in your runtime coverage without knowing it.

Fix: Alert on syscall event drops. Configure syscall_event_drops.actions: [log, alert]. Monitor the falco_events_processed_total vs falco_events_dropped_total Prometheus metrics. If drop rate is non-zero, either reduce rule complexity, increase the ring buffer size, or use the modern eBPF probe which handles back-pressure better.

# falco.yaml / Helm values
falco:
  syscall_event_drops:
    actions:
      - log
      - alert    # ← add this — otherwise drops are silent
    rate: 0.03333
    max_burst: 1

5. Not Watching the K8s Audit Log Source Separately¶

Falco can consume Kubernetes API audit events via its k8s_audit event source — but this requires a separate webhook configuration on the API server. If you only deploy Falco without configuring the K8s audit webhook, rules with source: k8s_audit (like kubectl exec into production pod) will never fire. You think you have coverage for API-level events but you do not.

Fix: Configure the Kubernetes API server to forward audit events to Falco (or use the Falco K8s audit plugin). This is a separate setup step from the main DaemonSet deployment:

# kube-apiserver flags
--audit-webhook-config-file=/etc/kubernetes/falco-webhook.yaml
--audit-policy-file=/etc/kubernetes/audit-policy.yaml

# falco-webhook.yaml
apiVersion: v1
kind: Config
clusters:
  - name: falco
    cluster:
      server: http://falco-service.falco.svc:9765/k8s-audit

6. Running Falco in Privileged Mode Without Understanding the Attack Surface¶

You deploy the Falco DaemonSet with securityContext.privileged: true to allow the kernel module to load. This is required for the kernel module driver — but you leave it enabled for the eBPF driver too, where it is not needed. You have a privileged pod running on every node, which is a significant security boundary failure if Falco itself is ever compromised.

Fix: Use the minimum required permissions for your driver mode: - modern_ebpf: needs only CAP_BPF, CAP_PERFMON, CAP_SYS_RESOURCE — not privileged - ebpf legacy: needs CAP_SYS_ADMIN — not fully privileged but significant - kmod: needs privileged — but you should be moving off kmod

The official Helm chart sets these correctly per driver mode. If you are overriding securityContext, verify you are not granting more than necessary.

7. Alerting Everything to One Channel With No Priority Routing¶

All Falco alerts — from DEBUG informational events to CRITICAL active exploits — go to the same Slack channel. The channel receives 300 messages per hour. On-call engineers learn to ignore it. A CRITICAL alert Container escape via privileged mount scrolls by unacknowledged.

Fix: Route by priority. Configure Falcosidekick to send CRITICAL/ALERT to PagerDuty (pages the on-call), WARNING/NOTICE to Slack (visible but not paging), and DEBUG/INFO to Loki only (searchable but not noisy). Set minimumpriority per output.

8. No Alerting on Falco Process Failure Itself¶

Falco crashes — OOM, probe incompatibility, configuration error. The DaemonSet restarts it, but there is a 30-second gap. Or the pod enters CrashLoopBackOff and your cluster has zero runtime security. You find out three days later when you look at the DaemonSet for an unrelated reason.

Fix: Monitor Falco's own operational health:

# Prometheus alert — Falco pod not running on all nodes
groups:
  - name: falco-health
    rules:
      - alert: FalcoDaemonSetNotFullyCovered
        expr: >
          kube_daemonset_status_desired_number_scheduled{daemonset="falco",namespace="falco"}
          - kube_daemonset_status_number_ready{daemonset="falco",namespace="falco"} > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Falco DaemonSet has {{ $value }} pods not ready — runtime security gap"

9. Applying a Custom Rule With Broken Syntax — Takes Down All Rules¶

You add a custom rule to falco_rules.local.yaml with a typo — an unclosed parenthesis in the condition. Falco fails to parse the rules file and loads zero rules. No alerts fire for any event. You have no runtime security at all until someone notices and fixes the syntax.

Fix: Always validate rules before deploying. Use falco --validate in CI:

# In your CI pipeline, before applying the ConfigMap
falco --validate /path/to/falco_rules.local.yaml
# Exit code 0 = valid, non-zero = syntax error

# Or in Kubernetes using a test pod
kubectl run falco-validate --rm -it --image=falcosecurity/falco-no-driver:latest \
  --restart=Never -- \
  falco --validate /etc/falco/falco_rules.local.yaml

10. Believing Image Scanning + Falco = Complete Security¶

Your security program has image scanning and Falco runtime detection. Leadership concludes the container security posture is complete. But: Falco does not detect vulnerabilities in running code that don't manifest as anomalous syscalls. A SQL injection exploited entirely through the application layer makes no syscall Falco would flag. An SSRF making HTTP calls to the metadata API looks like legitimate outbound traffic.

Fix: Understand Falco's actual detection model — it detects anomalous system behavior (unexpected processes, file access, network destinations), not application-layer attacks. Defense in depth still requires: WAF, application security testing, network policies, RBAC least-privilege, dependency scanning, and secrets management. Falco is one layer, not a complete solution.

11. Not Correlating Falco Alerts With Other Sources¶

Falco fires Contact K8S API Server From Container for a container that ran kubectl from inside a pod. This could be a legitimate operator tool or an attacker using a compromised pod to enumerate the cluster. Without correlating with K8s audit logs (who deployed this pod?), RBAC (what permissions does its ServiceAccount have?), and network logs (what did it connect to next?), you cannot distinguish legitimate from malicious.

Fix: Ship Falco alerts to Loki or Elasticsearch alongside K8s audit logs. Build correlated queries that link Falco events to the pod's RBAC permissions and deployment provenance:

# Loki query: Falco alerts + who deployed the offending pod
{job="falco"} | json | rule="Contact K8S API Server From Container"
# Then cross-reference: who ran `kubectl apply` for that pod?
{job="k8s-audit"} | json | verb="create" | objectRef_resource="pods" | objectRef_name="<pod-name>"