- devops
- l2
- scenario
- k8s-core
- prometheus
- cicd
- tcp-ip
- linux-fundamentals --- Portal | Level: L2: Operations | Topics: Kubernetes Core, Prometheus, CI/CD, TCP/IP | Domain: DevOps & Tooling
Adversarial Interview Gauntlet¶
30 escalating interview sequences where each question builds on the previous answer, going 5 rounds deep. The interviewer probes, challenges assumptions, and introduces constraints. Includes "trap" questions where the right answer is "I don't know, but here's how I'd find out."
How to Use¶
- Read only the Round 1 question and formulate your answer before reading further
- At each round, the interviewer challenges your previous answer — adapt in real time
- Pay attention to Trap Alerts — these mark questions where bluffing is worse than honesty
- Study the Senior Signal callouts to understand what separates good from great
- Review the skills-tested table at the end to identify your gaps
Categories¶
System Design (6 sequences)¶
Design a system, then defend every decision under constraint pressure.
| # | Sequence | Domains | Difficulty |
|---|---|---|---|
| 1 | Log Aggregation Pipeline | Observability, Kubernetes | L2-L3 |
| 2 | CI/CD for a Monorepo | CI/CD, Deployment Strategy | L2-L3 |
| 3 | Secrets Management System | Security, Compliance | L2-L3 |
| 4 | Monitoring Stack from Scratch | Observability, SRE | L2-L3 |
| 5 | Multi-Region Kubernetes | Kubernetes, Networking | L3 |
| 6 | Container Image Pipeline | Containers, Supply Chain Security | L2-L3 |
Incident Response (6 sequences)¶
Triage a live incident, then face cascading failures.
| # | Sequence | Domains | Difficulty |
|---|---|---|---|
| 7 | API Returning 503s | Networking, DNS | L2-L3 |
| 8 | Disk Usage on Prod Database | Database, Replication | L2-L3 |
| 9 | Pods Crash-Looping | Kubernetes, Linux Kernel | L2-L3 |
| 10 | Deploy Succeeded but Old Version Visible | CDN, Service Mesh | L2-L3 |
| 11 | Alerts Firing but System Seems Fine | Monitoring, Metrics Pipeline | L2-L3 |
| 12 | Customer Reports Data Inconsistency | Caching, Consistency | L2-L3 |
Debugging Live Systems (6 sequences)¶
Start with a symptom, go deeper with each round.
| # | Sequence | Domains | Difficulty |
|---|---|---|---|
| 13 | Container Using 2x Expected Memory | JVM, Containers | L2-L3 |
| 14 | Network Latency Spikes Every 30s | Networking, Linux Kernel | L2-L3 |
| 15 | Terraform Plan Shows 47 Destroys | Terraform, State Management | L2-L3 |
| 16 | Ansible Playbook 9x Slower | Ansible, LDAP | L2-L3 |
| 17 | Intermittent gRPC Failures | gRPC, Load Balancing | L2-L3 |
| 18 | Flaky CI Build | CI/CD, Linux cgroups | L2-L3 |
Architecture Trade-offs (6 sequences)¶
Propose an approach, defend it against alternatives.
| # | Sequence | Domains | Difficulty |
|---|---|---|---|
| 19 | Should We Use a Service Mesh? | Service Mesh, Networking | L2-L3 |
| 20 | Monolith or Microservices? | Architecture, Org Design | L2-L3 |
| 21 | Kubernetes or Simpler Orchestrator? | Kubernetes, Platform | L2-L3 |
| 22 | GitOps or Traditional CI/CD? | GitOps, Deployment | L2-L3 |
| 23 | Managed Database or Self-Hosted? | Database, Cloud | L2-L3 |
| 24 | eBPF for Observability? | eBPF, Observability | L3 |
Behavioral + Technical Hybrid (6 sequences)¶
"Tell me about a time..." then drill into the technical details.
| # | Sequence | Domains | Difficulty |
|---|---|---|---|
| 25 | Handling a Production Incident | Incident Response, Communication | L2-L3 |
| 26 | Disagreeing with a Technical Decision | Architecture, Leadership | L2-L3 |
| 27 | Learning Something Quickly | Learning, Technical Depth | L2-L3 |
| 28 | Your Approach to On-Call | SRE, On-Call | L2-L3 |
| 29 | When Automation Went Wrong | Automation, Risk Management | L2-L3 |
| 30 | Improving Team Development Workflow | Developer Experience, Metrics | L2-L3 |
Difficulty Guide¶
- L2 (Operations): You can operate and troubleshoot production systems
- L3 (Advanced): You can design, scale, and make strategic technical decisions
All sequences start at L2 (Round 1-2) and escalate to L3 (Rounds 3-5).
Tips for Interviewers¶
- Do not skip rounds or rearrange them — the escalation is designed to build
- The curveball (Round 4) is where most candidates differentiate — watch for honesty vs bluffing
- Strong candidates connect technical decisions to business outcomes in Round 5 without being prompted
- If a candidate gives a textbook answer in Round 1, accelerate to Round 3
Wiki Navigation¶
Prerequisites¶
- Interview: Deployment Stuck Progressing (Scenario, L2)
Related Content¶
- Mental Models (Core Concepts) (Topic Pack, L0) — CI/CD, Kubernetes Core, Linux Fundamentals
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals, Prometheus
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Kubernetes Core, Linux Fundamentals
- Case Study: Job Queue Backlog — Worker Pod CPU Throttled by cgroup (Case Study, L2) — Kubernetes Core, Linux Fundamentals
- Ops Archaeology: The Alerts That Stopped Firing (Case Study, L2) — Kubernetes Core, Prometheus
- Ops Archaeology: The Job That Succeeded Wrong (Case Study, L2) — Kubernetes Core, Linux Fundamentals
- Ops Archaeology: The Slow Death Nobody Noticed (Case Study, L2) — Linux Fundamentals, Prometheus
- Platform Engineering Patterns (Topic Pack, L2) — CI/CD, Kubernetes Core
- /proc Filesystem (Topic Pack, L2) — Linux Fundamentals
- AWS Networking (Topic Pack, L1) — TCP/IP
Pages that link here¶
- /proc Filesystem
- /proc Filesystem - Primer
- AWS Networking - Primer
- Advanced Bash for Ops
- Alerting Rules
- Alerting Rules - Skill Check
- Alerting Rules Drills
- CI Pipeline
- CI/CD - Skill Check
- CI/CD Drills
- CI/CD Pipeline Architecture
- CI/CD Pipelines - Primer
- Capacity Planning - Primer
- Chaos Engineering & Fault Injection - Primer
- DHCP & IP Address Management - Primer