Interview Scenarios¶

DevOps/SRE interview scenarios tied to the runtime labs and runbooks in this repository. Each scenario simulates a real incident and includes:

How to Use¶

#	Scenario	Difficulty	Lab	Runbook
1	Deployment stuck progressing	Medium	lab-runtime-01	readiness_probe_failed
2	HPA not scaling under load	Medium	lab-runtime-02	hpa_not_scaling
3	Prometheus says target down	Medium	lab-runtime-03	prometheus_target_down
4	Logs disappeared from Grafana Loki	Medium	lab-runtime-04	loki_no_logs
5	Helm upgrade broke prod	Medium	lab-runtime-05	helm_upgrade_failed
6	CI failed due to vulnerability scan	Easy	lab-runtime-06	n/a
7	Config drift detected in production	Hard	lab-runtime-07	n/a
8	Pods OOMKilled under load	Medium	lab-runtime-08	oomkilled
9	RBAC forbidden during deploy	Medium	n/a	rbac_forbidden
10	Ingress returns 404 intermittently	Medium	n/a	ingress_404
11	Server won't POST in the data center	Hard	n/a	n/a
12	TLS certificate expired	Medium	n/a	n/a
13	Secret leaked to Git	Hard	n/a	n/a
14	etcd database space exceeded	Hard	n/a	etcd_backup_restore
15	100% 503s after Istio rollout	Medium	n/a	n/a
16	GitOps drift causing outage	Medium	n/a	n/a
17	Database failover during deploy	Hard	n/a	n/a
18	Policy engine blocking all deploys	Medium	n/a	n/a
19	Vault tokens expired across services	Hard	n/a	n/a
20	Cloud cost spike investigation	Medium	n/a	n/a
21	Linux server running slow	Medium	n/a	n/a
22	Docker container won't start in prod	Medium	n/a	n/a