Skip to content

Portal | Level: L2: Operations | Topics: Kubernetes Networking | Domain: Kubernetes

Scenario: Ingress Returns 404 Intermittently

The Prompt

"Users report getting 404 errors intermittently when accessing our app through the ingress. Some requests work, some don't. The app pods are all running. What's going on?"

Initial Report

Customer support escalation: "Multiple customers are reporting that the app loads sometimes but gives a 'Not Found' page other times. It seems random. Started about 30 minutes ago."

Constraints

  • Time pressure: You have 15 minutes before the next escalation. Customer-facing errors are actively occurring.
  • Limited access: You have read access to the ingress and application namespace. Modifying the ingress controller (Traefik in kube-system) requires platform team involvement.

Observable Evidence

  • Dashboard: HTTP 404 rate is at ~30% of total requests. 200 responses still account for ~70%. Error rate correlates with specific backend pod IPs.
  • Ingress controller logs: Traefik logs show 404 page not found responses originating from one specific upstream pod IP.
  • Endpoints: kubectl get endpoints grokdevops shows 3 pod IPs, but one pod was recently restarted and may be serving stale routes.

Expected Investigation Path

# 1. Check ingress config
kubectl get ingress -n grokdevops -o yaml

# 2. Check endpoints — are all pods in the endpoint list?
kubectl get endpoints grokdevops -n grokdevops
kubectl get pods -n grokdevops -o wide

# 3. Check if some pods are not Ready
kubectl get pods -n grokdevops -o custom-columns='NAME:.metadata.name,READY:.status.containerStatuses[0].ready,IP:.status.podIP'

# 4. Check ingress controller logs
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50

# 5. Test individual pods
for ip in $(kubectl get endpoints grokdevops -n grokdevops -o jsonpath='{.subsets[0].addresses[*].ip}'); do
  echo "Testing $ip:"
  kubectl run curl-test-$RANDOM -n grokdevops --rm -i --restart=Never --image=curlimages/curl -- curl -s http://$ip:8000/health
done

Strong Answer

"Intermittent 404s with the ingress suggest the ingress controller is load-balancing across pods, and some of them are returning 404. This could mean: (1) some pods are running a different version with different routes — check if a rolling update is in progress; (2) the readiness probe is too lenient — pods are marked Ready before routes are registered; (3) one pod has a corrupted or misconfigured state. I'd check the endpoint list to see which pod IPs are included, then test each pod directly. If only some pods return 404, I'd compare their logs and configs. If all pods work individually, the issue might be in the ingress path configuration — specifically, pathType: Prefix vs Exact can cause unexpected behavior, and trailing slashes can matter depending on the ingress controller."

Common Traps

  • Blaming DNS — DNS issues cause connection failures, not 404s
  • Not testing individual pods — you need to isolate which pod(s) are misbehaving
  • Ignoring rolling updates — during an update, old and new pods coexist
  • Not checking pathTypePrefix vs Exact is a subtle but common source of 404s
  • Runbook: training/library/runbooks/kubernetes/ingress_404.md
  • Drills: training/library/drills/kubectl_drills.md — Drill 7 (endpoints), Drill 23 (ingress rules)
  • Quest: training/interactive/exercises/levels/level-24/k8s-ingress/

Wiki Navigation