Decision Tree: Monolith vs Microservices¶

Category: Architecture Decisions Starting Question: "Should we decompose this into microservices?" Estimated traversal: 4-5 minutes Domains: architecture, microservices, distributed-systems, team-topology, operations

The Tree¶

Should we decompose this into microservices?
│
├── Is this a greenfield project?
│   ├── Yes →
│   │   └── Is the team size > 10 engineers who will work on this system?
│   │       ├── Yes →
│   │       │   └── Are domain boundaries clear and stable? (not still being discovered)
│   │       │       ├── Yes →
│   │       │       │   └── Do different domains have genuinely different scaling requirements?
│   │       │       │       ├── Yes → DECISION: Microservices (clear justification)
│   │       │       │       └── No  → DECISION: Modular monolith (start here, extract later)
│   │       │       └── No  → DECISION: Modular monolith (domains not stable enough)
│   │       └── No  → DECISION: Modular monolith (team too small for distributed overhead)
│   │
│   └── No (existing monolith) →
│       └── Is the monolith causing active pain? (deployment contention, scaling bottleneck)
│           ├── No  → DECISION: Stay as modular monolith (don't fix what works)
│           └── Yes →
│               └── Do you have distributed systems expertise on the team?
│                   ├── No  → WARNING: Build expertise first; premature extraction risks > benefits
│                   └── Yes →
│                       └── Is the pain localized to a specific component?
│                           ├── Yes →
│                           │   └── Does that component have different scaling or deployment needs?
│                           │       ├── Yes → DECISION: Extract outlier service (strangler pattern)
│                           │       └── No  → DECISION: Modularize in-process first
│                           └── No (pain is broad across the codebase) →
│                               └── Do different teams own different parts with deployment conflicts?
│                                   ├── Yes →
│                                   │   └── Is the data model loosely coupled between those parts?
│                                   │       ├── Yes → DECISION: Full decomposition (team autonomy justified)
│                                   │       └── No  → WARNING: Data coupling — decouple data first
│                                   └── No  → DECISION: Modular monolith with better module boundaries

Node Details¶

Check 1: Greenfield vs Existing Monolith¶

How to assess: Is there any production code or data? A greenfield project has no existing users, no production data, and no established API contracts. Even "we have a prototype" may mean you have a codebase worth treating as an existing system. What you're looking for: Truly new system with no legacy constraints. The decision path diverges significantly: greenfield allows architectural choice; existing systems carry migration cost and risk. Common pitfall: Treating a 3-month-old monolith as "greenfield" to justify a rewrite. If you have paying customers or production data, it is an existing system with real migration risk.

Check 2: Team Size > 10 Engineers¶

How to assess: Count engineers who will actively contribute to and own the system — not just consumers of its APIs. Include the on-call rotation. A team of 5 that grows to 15 in 18 months is a team of 5 today. What you're looking for: Sufficient team density to absorb the overhead of distributed systems: network debugging, distributed tracing, service discovery, schema coordination, and independent deployment pipelines. Common pitfall: Counting future headcount. Plan for the team size you have. Under-resourced microservices architectures collapse into a distributed monolith (services that must be deployed together) — the worst of both worlds.

Check 3: Clear and Stable Domain Boundaries¶

How to assess: Can each proposed service be defined in one sentence without referencing another service's internals? Have the domain boundaries been stable for 3+ months without major refactoring? Do all engineers agree on what belongs in each domain? What you're looking for: Domain boundaries that reflect stable business capabilities, not technical layers. "User service, order service, payment service" reflects business domains. "Data service, logic service, presentation service" reflects a technical layering that will tightly couple across services. Common pitfall: Decomposing before domain understanding is complete. Conway's Law works in both directions: your services will reflect your understanding of the domain at the time of decomposition. Premature decomposition crystallizes an incomplete mental model into deployed infrastructure.

Check 4: Different Scaling Requirements¶

How to assess: Profile the existing or anticipated workload. Which components have load patterns that diverge by more than 10x? Which components have fundamentally different resource profiles (CPU-bound vs memory-bound vs I/O-bound)? What you're looking for: Components where independent scaling would reduce cost or improve reliability. For example: a read-heavy product catalog that can be scaled independently from a write-heavy order processing pipeline. Common pitfall: Assuming components need different scaling without data. Most components in a new system do not have measured scaling requirements. Avoid scaling-driven decomposition until you have production load data.

Check 5: Active Pain in the Existing Monolith¶

How to assess: Measure actual pain, not theoretical pain. Deployment contention: how many times per sprint did engineers wait for others' changes before theirs could ship? Scaling bottleneck: which component has been the proximate cause of an SLO breach in the last 90 days? What you're looking for: Documented, recurring pain traceable to the monolithic structure. "The codebase is big and hard to understand" is a modularization problem, not a microservices problem. "We can't scale the checkout flow without scaling the entire application" is a valid argument for extraction. Common pitfall: Decomposing for theoretical future pain. Microservices introduce immediate, concrete operational complexity. The pain they solve must be current and measurable to justify the overhead.

Check 6: Distributed Systems Expertise¶

How to assess: Can your team answer these without looking them up? (a) How do you handle a distributed transaction that partially fails? (b) What is a saga pattern and when do you use it? (c) How do you debug a request that spans three services with no trace correlation? (d) What is idempotency and how do you implement it for a payment retry? What you're looking for: At least 2-3 engineers who have operated distributed systems in production, have dealt with partial failure scenarios, and understand eventual consistency implications. Common pitfall: "We'll learn as we build." Distributed systems fail in non-obvious ways that require specific knowledge to debug and recover from. Building that expertise during production incidents is expensive. Read "Designing Data-Intensive Applications" as a minimum baseline before decomposing.

Check 7: Pain Localized to a Specific Component¶

How to assess: Review the last 10 incidents or deployment delays. Is a single component responsible for more than 50% of them? Is one module responsible for the bulk of the deployment friction? What you're looking for: A clear extraction candidate — one service that, if removed from the monolith, would resolve the majority of the identified pain. Common pitfall: When pain is spread evenly across the codebase, extracting one service provides minimal relief while adding distributed system overhead. The correct remedy for broadly distributed pain is usually better in-process modularization, not decomposition.

Check 8: Data Model Coupling¶

How to assess: For the services you want to create, draw the entity-relationship diagram. Count how many foreign key relationships cross the proposed service boundary. If Service A's entities reference Service B's entities via foreign key, they share a data model. What you're looking for: Services with minimal cross-boundary data references. Services that share a tightly coupled data model cannot achieve true deployment independence — they must coordinate schema migrations, and any data model change ripples across service boundaries. Common pitfall: Decomposing services while leaving them sharing a database. This creates a "distributed monolith" — you have network latency and operational overhead with none of the deployment independence benefits. Data decoupling must precede or accompany service decomposition.

Terminal Actions¶

Decision: Modular Monolith¶

Choose: Single deployable unit with strong internal module boundaries enforced by code structure (packages, modules, visibility rules) and architectural tests. Why: A well-structured monolith has lower operational complexity, simpler debugging, atomic deployments, and no distributed transaction problem. For most systems under 10 engineers, a modular monolith delivers 90% of the organizational benefits of microservices with 10% of the operational cost. Next step: Invest in module boundaries: define public APIs between modules, disallow direct cross-module database access, enforce boundaries with ArchUnit, dependency-cruiser, or similar tooling. This positions you for future extraction if scaling requirements emerge.

Decision: Extract Outlier Service (Strangler Pattern)¶

Choose: Extract the one or two components with genuinely different scaling or deployment requirements while leaving the rest of the application in the monolith. Why: The strangler fig pattern (incrementally extracting services from a monolith) provides a controlled migration path. You gain deployment independence for the specific components that need it without incurring the full overhead of a microservices architecture. Next step: Identify the extraction candidate. Build the new service behind an interface that the monolith currently satisfies. Switch over traffic incrementally (feature flag or proxy). Verify the extracted service runs correctly before removing the in-monolith implementation. Do not extract the next service until the first extraction is stable in production.

Decision: Full Microservices Architecture¶

Choose: Independent services with separate repositories, deployment pipelines, databases, and on-call rotations per service team. Why: Full decomposition is justified when team size requires independent deployment velocity, domains are stable and well-understood, teams own distinct domains end-to-end, and scaling requirements genuinely diverge across components. At this scale, the operational overhead is offset by team autonomy and independent delivery velocity. Next step: Establish the foundational platform capabilities before decomposing: service discovery, distributed tracing, centralized logging, a service-level SLO framework, and a shared secrets management system. Do not begin decomposition before the platform is production-ready.

Decision: Stay as Monolith (No Active Pain)¶

Choose: Continue with the current architecture. Improve internal structure, testing, and documentation rather than restructuring deployment topology. Why: "If it ain't broke, don't fix it" applies strongly to architecture. Architectural migrations are expensive, risky, and distract from feature delivery. The burden of proof is on the migration to demonstrate measurable current pain that the new architecture will reliably resolve. Next step: Define the leading indicators that would trigger an architectural review: deployment contention rate, on-call incident rate by component, scaling cost inflection points. Revisit when those thresholds are crossed.

Decision: Modularize In-Process First¶

Choose: Refactor the monolith into well-defined in-process modules without changing the deployment model. Why: When a specific component causes pain but its pain is organizational (unclear ownership, too many changes in one area) rather than operational (scaling, deployment independence), in-process modularization resolves the root cause without introducing distributed system complexity. Next step: Define module boundaries with explicit public APIs. Enforce that no module directly accesses another module's database tables. Add architectural tests that fail when boundary rules are violated. Once boundaries are clean and stable for 2-3 months, extraction to a separate service becomes low-risk if deployment independence is then needed.

Warning: Build Distributed Systems Expertise First¶

When: Active pain exists in the monolith but the team lacks distributed systems experience. Risk: A premature microservices decomposition by an inexperienced team typically results in a distributed monolith — services that are tightly coupled through shared databases, synchronous call chains with no circuit breakers, and coordinated deploys. This is operationally worse than the original monolith and harder to undo. Mitigation: Send 2-3 engineers through hands-on distributed systems training. Run chaos engineering experiments on the current system to build incident response skills. Extract one non-critical service as a learning exercise before decomposing the critical path. Read postmortems from organizations that have done this migration (Segment, Amazon, Shopify all have published accounts).

Warning: Decouple Data Before Services¶

When: You are planning to create service boundaries that cut across tightly coupled database tables. Risk: Services sharing a database cannot be deployed independently. A schema migration for one service breaks the other. The operational complexity of microservices is incurred without the deployment independence benefit — a distributed monolith. Mitigation: Spend one or two quarters on data model decomposition before service decomposition. Create separate schemas within the shared database, enforce that each schema is accessed only by its owning module, eliminate cross-schema foreign keys, and replace them with eventual consistency patterns. Once data is logically separated, physical service separation becomes straightforward and low-risk.

Edge Cases¶

Platform/infrastructure services: Services that are genuinely shared infrastructure (auth, notification, payment) often have valid microservices justifications even in small teams. A team of 6 might reasonably extract a payment service to isolate PCI scope. Use domain criticality and compliance scope as additional criteria for these decisions.
Different SLO requirements: If one component requires 99.99% availability while the rest of the system runs at 99.9%, extraction can be justified to allow independent reliability investment without dragging up the entire system's operational cost.
Regulatory isolation: Services that process regulated data (PII, financial, health) may need to be isolated for audit, access control, or compliance reasons regardless of team size or scaling requirements. Compliance isolation is a valid decomposition driver independent of operational scaling concerns.
Acquired codebase (M&A): When a codebase is acquired, run it as a separate service initially rather than integrating immediately. Integration is irreversible, and running separately preserves optionality while you learn the codebase. Merge only when you have a clear technical and organizational case for doing so.
Microservices already deployed but causing pain: If you already have a microservices architecture generating operational pain disproportionate to its benefits, consider selective re-merging (service consolidation). Merging 3-4 closely collaborating services into a modular monolith is a legitimate architectural move that reduces operational complexity. This is rarely discussed but frequently the right answer for over-decomposed systems.

Cross-References¶

Topic Packs: Architecture Patterns, Distributed Systems, Team Topology
Related trees: Sync vs Async Communication, Which Database, Service Mesh