Skip to content

Interview Gauntlet: Kubernetes or Simpler Orchestrator?

Category: Architecture Trade-offs Difficulty: L2-L3 Duration: 15-20 minutes Domains: Kubernetes, Platform Engineering


Round 1: The Opening

Interviewer: "A startup with 3 services and 4 engineers is considering Kubernetes for their production infrastructure. They're currently deploying with Docker Compose on a single EC2 instance. Should they adopt Kubernetes?"

Strong Answer:

"For 3 services, 4 engineers, and a single EC2 instance, Kubernetes is almost certainly overkill. The operational overhead of running even a managed Kubernetes cluster (EKS, GKE, AKS) includes: understanding pod specs, services, ingress controllers, RBAC, namespaces, Helm charts, kubectl debugging, node group management, and the networking model. That's a significant learning curve for a 4-person team that could be shipping features. I'd recommend an intermediate step: ECS with Fargate if they're on AWS, Cloud Run on GCP, or Azure Container Apps. These provide container orchestration — health checks, auto-scaling, rolling deployments, load balancing — without the Kubernetes operational burden. If they want to stay on bare instances, Docker Compose with a simple blue-green deployment script works for 3 services. The decision framework: Kubernetes becomes worth it when the operational problems it solves (multi-service orchestration, complex scheduling, extensive ecosystem of operators) outweigh the operational problems it creates (cluster management, networking complexity, YAML configuration). For 3 services, the balance doesn't favor Kubernetes."

Common Weak Answers:

  • "Yes, learn Kubernetes now so you're ready when you scale." — Learning Kubernetes without needing it is a time investment with delayed and uncertain payoff for a startup that might pivot.
  • "Just use Docker Compose forever." — Docker Compose on a single host has no redundancy, no auto-healing, no rolling deployments. The current setup has real problems; the question is whether Kubernetes is the right solution.
  • "It depends on what they're building." — True but the answer should engage with the given constraints (3 services, 4 engineers, 1 instance).

Round 2: The Probe

Interviewer: "They push back: 'But we plan to grow to 20 services and 30 engineers in two years. Won't we have to migrate to Kubernetes eventually anyway? Better to start now.' How do you respond?"

What the interviewer is testing: Ability to reason about premature infrastructure investment and the cost of migration vs the cost of premature adoption.

Strong Answer:

"The argument 'we'll need it eventually' sounds logical but has hidden costs. First, the startup might not grow to 20 services — 90% of startups don't reach that scale. Investing in Kubernetes now is optimizing for a future that may not arrive. Second, the migration cost from ECS or Cloud Run to Kubernetes is well-understood and bounded: you're already running containers, so the application code doesn't change. You need to write Kubernetes manifests (mostly a templating exercise), set up a cluster, and update your CI/CD pipeline. For a team that's already running containers, this is a 2-4 week migration when they're ready. That's much cheaper than 4 engineers spending 2 years learning to operate Kubernetes when they could have been shipping product. Third, the Kubernetes ecosystem moves fast — the best practices, tools, and even the Kubernetes APIs change significantly over 2 years. Investing heavily now means you'll need to re-learn and re-architect when you actually scale. My recommendation: start with a simpler container platform now. Invest the engineering time in clean service interfaces, good CI/CD, and observability. These investments transfer directly to Kubernetes when the time comes. Plan to evaluate Kubernetes when you hit 8-10 services or 15+ engineers — that's usually when the coordination problems that Kubernetes solves start to bite."

Trap Alert:

If the candidate bluffs here: The interviewer might ask "How long does a migration from ECS to EKS typically take for a 10-service team?" There's no universal answer, but a reasonable range is 4-8 weeks for the infrastructure migration, assuming containerized applications with no Kubernetes-specific features needed. The candidate should acknowledge this is an estimate and that the actual time depends on how Kubernetes-specific the team wants to go (Helm charts, operators, service mesh, etc.).


Round 3: The Constraint

Interviewer: "They've grown. Now they have 15 services, 20 engineers, and they're on ECS. They're hitting real pain: service discovery is hacky, config management is scattered, and the deployment process is different for each service. Now do you recommend Kubernetes?"

Strong Answer:

"Now the trade-off shifts. With 15 services and 20 engineers, the consistency and ecosystem benefits of Kubernetes start to outweigh the operational cost. Their current pain points — hacky service discovery, scattered config, inconsistent deployments — are exactly what Kubernetes provides out of the box: DNS-based service discovery, ConfigMaps and Secrets for config management, and Deployments for standardized rollout strategies. But I'd still recommend managed Kubernetes (EKS) over self-managed, and I'd invest in the platform layer before migrating workloads. Step one: set up EKS with managed node groups, cluster autoscaler, and a basic ingress controller. Step two: build the platform — a Helm chart template that all services use, a CI/CD pipeline template that builds, tests, and deploys, and an observability stack (Prometheus + Grafana + Loki). Step three: migrate services incrementally, starting with the least critical services to build confidence. Step four: standardize. Once 3-4 services are on Kubernetes and the patterns are proven, create a service onboarding playbook and migrate the rest. The key insight: Kubernetes adoption for 15 services is a platform engineering project, not a 'let's just move containers.' It requires investment in shared tooling and standards, ideally with 1-2 engineers focused on the platform."

The Senior Signal:

What separates a senior answer: Framing Kubernetes adoption as a platform engineering project rather than an infrastructure migration. The value of Kubernetes at scale comes from standardization — every service deploys the same way, configures the same way, and is observed the same way. Without platform investment, you just move the inconsistency from ECS to Kubernetes. Also: recommending incremental migration starting with non-critical services, not a big-bang move-everything weekend.


Round 4: The Curveball

Interviewer: "An SRE on the team argues: 'We should use Nomad instead of Kubernetes. It's simpler, it can orchestrate containers and non-container workloads, and HashiCorp integrates it with Vault and Consul.' Has the SRE got a point?"

Strong Answer:

"It's a valid point that deserves fair evaluation. Nomad is genuinely simpler than Kubernetes — it's a single binary, the scheduling model is more straightforward, and it handles both container and non-container workloads (which is useful if you have legacy batch jobs or raw binary executables). The Consul + Vault integration is real and well-tested. Where Nomad falls short compared to Kubernetes: ecosystem. Kubernetes has a massive third-party ecosystem — operators for databases, message queues, monitoring stacks, CI/CD tools, service meshes, and hundreds of CRDs and controllers. If you need a PostgreSQL operator, a Prometheus operator, a cert-manager, or a GitOps controller, Kubernetes has mature, community-supported options. Nomad's ecosystem is smaller. The other factor is hiring: almost every DevOps/SRE candidate in the market knows or is learning Kubernetes. Nomad expertise is much rarer. When you need to hire your next SRE, the Kubernetes talent pool is 10x larger. My assessment: for teams that are heavy HashiCorp users (already on Consul, Vault, and Terraform), Nomad is a coherent choice that's genuinely simpler. For teams starting fresh with no HashiCorp investment, Kubernetes' ecosystem and talent pool advantage is hard to overcome. I'd ask the SRE: 'Are you arguing for Nomad because it's technically better for our workloads, or because you know HashiCorp tools well?' Both are valid reasons, but they lead to different conversations."

Trap Question Variant:

The right answer is "Both are viable, with different strengths." Candidates who dismiss Nomad as "nobody uses it" are showing ecosystem bias. Candidates who advocate strongly for Nomad without acknowledging the ecosystem and hiring advantages of Kubernetes are showing personal preference over pragmatism. The senior signal is evaluating both fairly and identifying the actual deciding factors for this specific team.


Round 5: The Synthesis

Interviewer: "Regardless of whether you choose Kubernetes, Nomad, or ECS — what makes any orchestrator adoption successful or unsuccessful?"

Strong Answer:

"Three things determine success. First, invest in the developer experience layer, not just the infrastructure. Engineers shouldn't need to write 200 lines of YAML to deploy a service. Build templates, CLI tools, or a developer portal that abstracts the orchestrator. The best signal is: can a new engineer deploy their first service in under an hour without understanding the underlying orchestrator? If yes, the adoption will succeed. If they need to learn Kubernetes concepts first, adoption will be slow and painful. Second, standardize early, customize later. Start with one way to deploy, one way to configure, one way to observe. When teams start with 'everyone can choose their own tooling on top of the orchestrator,' you get 15 services with 15 different deployment patterns, which is worse than what you had before. Establish golden paths and only deviate when there's a compelling reason. Third, own the migration timeline. Don't force all teams to move at once — some teams have more urgent feature work. But also don't let migration stall — set a sunset date for the old platform and provide support for teams that are slow to move. The unsuccessful adoptions I've seen all share one trait: the platform team built the infrastructure but didn't build the developer experience, so teams avoided the platform and kept deploying the old way."

What This Sequence Tested:

Round Skill Tested
1 Context-appropriate infrastructure selection
2 Cost-benefit analysis of premature infrastructure investment
3 Platform engineering approach to Kubernetes adoption
4 Fair technology comparison and evaluation of alternatives
5 Platform adoption strategy and developer experience thinking

Prerequisite Topic Packs