Interview Gauntlet: CI/CD for a Monorepo¶
Category: System Design Difficulty: L2-L3 Duration: 15-20 minutes Domains: CI/CD, Deployment Strategy
Round 1: The Opening¶
Interviewer: "Design a CI/CD pipeline for a monorepo that contains 8 microservices, shared libraries, and infrastructure-as-code. How do you structure it?"
Strong Answer:¶
"The key challenge with a monorepo CI/CD is selective execution — you don't want to build and deploy all 8 services when someone changes one service. I'd use path-based triggering: each service lives in its own directory, and the CI configuration maps directory paths to pipeline stages. In GitHub Actions, that's paths filters on workflows; in a tool like Bazel or Nx, it's the dependency graph. For each service, the pipeline has: lint, unit test, build container image, push to registry, deploy to staging. Shared libraries are trickier — a change there needs to trigger builds for all services that depend on that library. I'd model this as an explicit dependency graph, either using a build tool that understands it natively like Bazel, or a simpler approach with a CODEOWNERS-style mapping file that lists which services depend on which shared libs. The IaC portion runs separately — Terraform plan on PR, apply on merge to main."
Common Weak Answers:¶
- "Just run everything on every push." — Doesn't scale. With 8 services, you're wasting CI minutes and slowing feedback loops for everyone.
- "Use separate repos for each service." — Answers a different question. The interviewer asked about a monorepo; pivoting to multirepo avoids the design challenge.
- No mention of shared library dependencies — This is the hardest part of monorepo CI/CD, and skipping it suggests no real experience.
Round 2: The Probe¶
Interviewer: "How do you handle the shared library dependency problem specifically? Service A and Service B both import a shared auth library. Someone changes the auth library. Walk me through exactly what happens."
What the interviewer is testing: Whether the candidate has actually dealt with diamond dependency problems in monorepos or is just reciting theory.
Strong Answer:¶
"When the auth library changes, the CI system needs to know that Services A and B both depend on it. There are a few ways to do this. The simplest is a dependency manifest — a YAML or JSON file at the repo root that maps shared/auth to [services/service-a, services/service-b]. The CI pipeline reads this on every PR and expands the affected paths. A more robust approach is using a build tool with a dependency graph — Bazel's rdeps query can tell you exactly which targets are affected by a change to //shared/auth:lib. In practice, I'd use bazel query 'rdeps(//..., //shared/auth/...)' to get the list, then run tests and builds for only those targets. The PR would show: auth library tests pass, Service A tests pass, Service B tests pass, both images rebuilt. Services C through H are untouched. One gotcha: you need to version or pin the shared library somehow, or at least have integration tests, because the shared lib change could break a service in a way that unit tests don't catch."
Trap Alert:¶
If the candidate bluffs here: The interviewer will ask "What's the cold-cache build time for Bazel on a repo this size?" or "Have you actually used Bazel, or is this theoretical?" It's perfectly fine to say "I've used path-based triggering with GitHub Actions in practice, and I've studied Bazel's approach but haven't operated it in production. The dependency manifest approach is what I've actually shipped."
Round 3: The Constraint¶
Interviewer: "Now add canary deployments and zero-downtime database migrations to this pipeline. Service C has a PostgreSQL database that needs a schema change alongside its code change. How does the pipeline handle that?"
Strong Answer:¶
"Database migrations and canary deploys create a version compatibility requirement: the old code and the new code need to work with the same database schema simultaneously during the canary window. So the pipeline enforces a two-phase migration pattern. Phase 1: deploy a 'forward-compatible' migration that adds new columns or tables but doesn't remove or rename anything the old code uses. The canary runs the new code against this expanded schema. Phase 2: after the canary is promoted to 100%, a cleanup migration removes deprecated columns. In the pipeline, this looks like: the PR includes a migration file that's tagged as either expand or contract. The pipeline validates that the migration is backward-compatible by running the old code's test suite against the new schema. For the canary itself, I'd use Argo Rollouts or Flagger — the pipeline deploys the new version to the canary, runs analysis (error rate, latency from Prometheus), and auto-promotes or rolls back. The migration runs as a Kubernetes Job before the canary starts, with a pre-sync hook in Argo CD or a Helm pre-upgrade hook."
The Senior Signal:¶
What separates a senior answer: Naming the expand-and-contract migration pattern explicitly and understanding why it exists — because canary deployments mean two versions of code run simultaneously against the same database. Junior candidates try to run the migration "during the deploy" without considering that the old pods are still reading and writing.
Round 4: The Curveball¶
Interviewer: "You wake up Monday morning and the monorepo now has 50 microservices, not 8. CI takes 45 minutes even with path-based filtering because the dependency graph is so interconnected. How do you fix this without splitting the repo?"
Strong Answer:¶
"At 50 services with deep interconnections, the CI time problem is usually a combination of test time and build time. I'd attack both. First, remote caching — Bazel's remote cache or a tool like Turborepo's remote cache means that if Service A's build output hasn't changed (same inputs, same hash), it's pulled from cache in seconds instead of rebuilt. This alone can cut CI time by 60-80% for PRs that touch a few services. Second, test impact analysis — instead of running all tests for affected services, use code coverage data to identify which tests actually exercise the changed code paths. Tools like pytest-testmon for Python or Bazel's test trimming can skip tests that can't possibly be affected. Third, parallelize aggressively — split the 50 services into a dependency-ordered DAG and run independent branches in parallel. If Services A, B, and C have no common dependencies, their builds run simultaneously. Fourth, and this is the organizational lever: if the dependency graph is 'so interconnected,' that's an architecture smell. I'd look at whether those shared dependencies can be versioned and published as internal packages rather than source-level imports, which would break the CI coupling."
Trap Question Variant:¶
The right answer is "I don't know the exact cache hit rates." Candidates who claim "remote caching gives 90% speedup" without qualifying it are guessing. The honest version: "In my experience, remote caching helps significantly but the actual improvement depends on how much code actually changes per PR. I'd measure cache hit rates for a week before committing to a target."
Round 5: The Synthesis¶
Interviewer: "Your team lead says 'this CI/CD system is too complex, I just want to push code and have it deploy.' How do you respond?"
Strong Answer:¶
"They're right that the developer experience should feel simple — the complexity should be hidden, not exposed. The goal is that a developer changes a file, opens a PR, and sees a green check or a clear error message. They shouldn't need to understand Bazel dependency graphs or canary weight configurations. So I'd invest in the developer experience layer: clear PR status checks that say 'Service A: tests passing, canary deployed to staging' rather than a wall of CI logs. One-click rollback from a Slack bot or dashboard. A deploy label on the PR that triggers production promotion. The underlying system is complex because the problem is complex — 50 services, database migrations, canary analysis — but the interface doesn't have to be. I'd also document the escape hatches: how to force a full rebuild, how to skip canary for hotfixes, how to manually roll back. The worst outcome is when the system is both complex AND opaque, so developers route around it. The system should be complex under the hood but feel like 'push and deploy' for the 90% case."
What This Sequence Tested:¶
| Round | Skill Tested |
|---|---|
| 1 | CI/CD architecture fundamentals for monorepos |
| 2 | Hands-on experience with dependency management in build systems |
| 3 | Database migration strategy under canary deployment constraints |
| 4 | Performance optimization and intellectual honesty about numbers |
| 5 | Communication of technical complexity and developer experience thinking |
Prerequisite Topic Packs¶
- CI/CD Pipelines & Patterns
- CICD Pipelines Realities
- GitHub Actions
- Database Ops
- Progressive Delivery