Skip to content

Diagnostic Questions

Before revealing the investigation path:

  1. The alert says the payment-service container is OOMKilled. What is your first troubleshooting step? Would you immediately profile the application's memory usage, or check something else first?

  2. kubectl top pod --containers shows the app using 118Mi/256Mi and the sidecar using 127Mi/128Mi. How does Kubernetes decide which container to OOMKill when the pod is under memory pressure? Could the OOMKill attribution in the event be misleading?

  3. The sidecar memory usage started climbing 2 hours ago, which coincides with a Helm deployment. What commands would you run to determine what changed in the sidecar's configuration between the old and new deployment?

  4. The root cause is a trace sampling change from 1% to 100% in Helm values. Why is the correct fix in the Helm values file (devops tooling) rather than increasing the sidecar memory limit (Kubernetes) or reconfiguring the tracing backend (observability)?

  5. What guardrails would you put in place to prevent an observability configuration change from causing a production outage? Consider both the deployment pipeline and runtime monitoring.