Mental Model: Strangler Fig¶
Category: Architecture & Design Origin: Martin Fowler (2004), inspired by the strangler fig tree (Ficus aurea) One-liner: Incrementally replace a legacy system by building new functionality around it and gradually routing traffic away until the old system can be safely retired.
The Model¶
The strangler fig tree begins life as a seed in the canopy, dropped there by birds. It grows roots down through the host tree, eventually encasing the trunk entirely. Over years, the fig tree becomes self-supporting while the host tree rots away inside. From the outside, you see only the fig. The host is gone, but there was never a moment of catastrophic replacement — the transition was continuous.
Fowler named his migration pattern after this tree because the metaphor is almost perfectly exact. The strangler fig pattern says: do not attempt a big-bang rewrite of a legacy system. Instead, build new capability in a new system alongside the old one, intercept traffic at the seams (API gateways, load balancers, feature flags), route new behavior to the new system, and keep the legacy system running the old behavior until its surface area shrinks to nothing. Then decommission it. If the new system ever fails catastrophically, you have the old one to fall back to.
The pattern requires three infrastructure pieces: a facade (the interception layer — usually an API gateway, reverse proxy, or load balancer), the new system (the replacement being built incrementally), and the legacy system (running in parallel during the transition). The facade is the key: it must be able to route any given request to either system, ideally with routing logic that can be changed without deploying either application. Traffic shifts gradually — 1% to new, then 10%, then 50%, then 100% — with the ability to roll back routing at any point.
The boundary conditions where this pattern applies are specific: the legacy system must have a clear seam — a network interface, an API surface, a message queue — where interception is possible. If the legacy is a library linked directly into a monolith with no network boundary, the strangler fig requires first extracting that boundary. The pattern also assumes the new and old systems can coexist with shared or synchronized data, which is often the hardest part. Two systems reading the same database is manageable. Two systems writing the same database without coordination is a consistency disaster.
Where this pattern fails: teams that build the facade and new system but never actually shift traffic — the "strangler fig that never strangles." This happens when the new system perpetually lags behind the legacy's feature set, or when organizational pressure prevents cutting over. The pattern demands a migration plan with a deadline, not just a parallel implementation strategy.
Visual¶
PHASE 1: FACADE INTRODUCED (day 0)
┌─────────────────┐
┌─────────────────────► Legacy System │
Client ──────► │ API Gateway/Proxy │ (100% traffic) │
└─────────────────────► │
└─────────────────┘
PHASE 2: NEW SYSTEM BUILT ALONGSIDE (incremental)
┌─────────────────┐
┌──── /old/* ─────────► Legacy System │
Client ──────► │ API Gateway/Proxy │ (80% traffic) │
└──── /new/* ─────────► │
└─────────────────┘
┌─────────────────┐
│ New System │
│ (20% traffic) │
└─────────────────┘
PHASE 3: TRAFFIC SHIFTED (gradual cutover)
┌─────────────────┐
┌──── legacy only ────► Legacy System │
Client ──────► │ API Gateway/Proxy │ (10% traffic) │
└──── everything else ► │
└─────────────────┘
┌─────────────────┐
│ New System │
│ (90% traffic) │
└─────────────────┘
PHASE 4: LEGACY RETIRED
┌─────────────────┐
Client ──────► API Gateway/Proxy ─────► New System │
│ (100% traffic) │
└─────────────────┘
[Legacy decommissioned]
ROUTING DECISION IN FACADE:
┌─────────────────────────────────────────────────┐
│ if request.path starts_with("/api/v2/") │
│ route to new_system │
│ elif feature_flag("new-checkout") is enabled │
│ route to new_system │
│ else │
│ route to legacy │
└─────────────────────────────────────────────────┘
flowchart LR
subgraph "Phase 1: Facade"
C1[Client] --> GW1[API Gateway]
GW1 -->|100%| L1[Legacy System]
end
subgraph "Phase 2: Split"
C2[Client] --> GW2[API Gateway]
GW2 -->|80%| L2[Legacy System]
GW2 -->|20%| N2[New System]
end
subgraph "Phase 3: Cutover"
C3[Client] --> GW3[API Gateway]
GW3 -.->|10%| L3[Legacy System]
GW3 -->|90%| N3[New System]
end
subgraph "Phase 4: Retired"
C4[Client] --> GW4[API Gateway]
GW4 -->|100%| N4[New System]
end
When to Reach for This¶
- You have a working legacy system that cannot be taken offline for a rewrite — it's generating revenue or serving critical traffic right now
- The legacy system has a network boundary (HTTP API, message queue) where a facade can be inserted without touching the legacy codebase
- You need to migrate from a monolith to microservices incrementally, one domain at a time
- Your team is small relative to the legacy surface area — you need to migrate while still shipping features, not pause everything for a rewrite
- You've been burned by a big-bang rewrite before and need a pattern that allows course-correction mid-migration
- You want to validate the new system under real production traffic before fully committing to it
When NOT to Use This¶
- The legacy system has no network boundary — it's an in-process library or a batch job that runs in the same address space as everything else; you must first extract a seam before the strangler fig applies
- The legacy and new systems cannot share or synchronize the same data store — if each needs its own authoritative database with no sync mechanism, you'll create consistency problems that outweigh the migration benefits
- Your team lacks the discipline to actually shift traffic and decommission — if the pattern becomes "run two systems forever," you've doubled operational complexity with no end in sight
- The legacy system is so fragile that inserting a proxy in front of it causes failures — some systems rely on direct connections in ways that the facade disrupts (timing, connection semantics, protocol specifics)
Applied Examples¶
Example 1: Migrating a Rails monolith to microservices¶
A company has a Rails monolith handling orders, users, and inventory. They want to extract the inventory service first. Step 1: introduce an Nginx proxy in front of the monolith — all traffic passes through, behavior unchanged.
# Phase 1: facade routes everything to legacy
server {
listen 80;
location / {
proxy_pass http://rails-monolith:3000;
}
}
Step 2: build the new inventory microservice. Step 3: route inventory endpoints to it:
# Phase 2: facade routes /api/inventory/* to new service
server {
listen 80;
location /api/inventory/ {
proxy_pass http://inventory-service:8080;
}
location / {
proxy_pass http://rails-monolith:3000;
}
}
The Rails monolith still handles inventory internally for now — the new service is in shadow mode, receiving traffic but responses are compared, not served. Once confidence is established, the monolith's inventory logic is disabled and the facade routes are made authoritative. The Rails monolith continues running for orders and users — unaffected.
Example 2: Replacing a legacy payment processor integration¶
A fintech company uses a legacy payment gateway SDK embedded in their checkout service. A new payment provider offers better rates. Rather than rewriting checkout, they:
- Wrap the checkout service's payment calls behind an internal abstraction layer (the seam).
- Deploy the new payment provider integration as a separate service.
- Use a feature flag to route a percentage of transactions to the new provider:
def process_payment(order, user):
if feature_flags.is_enabled("new-payment-provider", user_id=user.id):
return new_payment_service.charge(order)
else:
return legacy_payment_sdk.charge(order)
They start at 1% of users (canary), monitor error rates and success rates, increase to 10%, 50%, and finally 100% over two weeks. At each stage, the rollback is: flip the flag. When 100% is stable for two weeks, the legacy SDK dependency is removed and the flag cleaned up.
Example 3: Database migration during strangler fig (the hardest part)¶
The two-system coexistence problem becomes acute when both systems write to the same database. In a migration from a PHP monolith to a Go microservice for the user authentication domain:
Phase 1 — Dual read, legacy writes: The new Go service reads from the same users table the monolith writes to. The Go service never writes; it reads and returns data. This is safe and synchronizes automatically.
Phase 2 — Dual write (danger zone): As the new service starts handling some write paths (password changes, user creation), both systems write to the same table. This requires: - A shared schema that both can interpret (no table renames, no column deletions yet) - Column additions are backward-compatible; column removals must wait until the legacy system no longer reads them - The facade ensures a given user's write operations go to one system at a time (not both simultaneously)
# Facade ensures user auth writes are routed consistently per user
# (using consistent hashing on user_id, not random split)
map $cookie_user_id $auth_backend {
~^[0-4] legacy-auth:3000; # users 0-49%: legacy
default go-auth:8080; # users 50-100%: new
}
server {
location /auth/ {
proxy_pass http://$auth_backend;
}
}
Phase 3 — Legacy reads only its own data: Once all writes go to the new service, the legacy system's write paths are disabled (return 302 to new service endpoints). The legacy still reads the shared table but is now effectively read-only — the new service is the authoritative writer.
Phase 4 — Schema migration: With the legacy not writing, you can safely migrate the schema to the new service's preferred shape — rename columns, drop legacy columns, add new indexes. A brief maintenance window (or an online migration tool) handles the transition.
This sequence — read-only new system, then dual write with routing discipline, then legacy read-only, then schema migration — is the canonical pattern for strangling a system with a shared database. Skipping any phase risks data inconsistency.
The Junior vs Senior Gap¶
| Junior | Senior |
|---|---|
| Proposes a full rewrite: "we pause feature development for 6 months and rebuild from scratch" | Asks "where are the seams?" before proposing any migration strategy |
| Starts building the new system without a facade — plans to do a cutover when it's "done" | Installs the facade on day one, even when it's a no-op passthrough, to validate the interception layer |
| Migrates everything at once: the new system must match 100% of legacy behavior before any traffic shifts | Identifies the highest-value 20% of functionality, migrates that first, puts it under real traffic |
| Treats the legacy system as an obstacle to be destroyed | Treats the legacy system as a fallback safety net throughout the migration |
| Decommissions the legacy immediately after cutover | Waits 30–90 days of stable production traffic before decommissioning; verifies no hidden callers remain |
| Underestimates data migration — assumes the new system can read from the same DB | Plans data migration explicitly: sync strategy, consistency model, cutover sequence |
Connections¶
- Complements: Circuit Breaker (use together for — when the new system is receiving partial traffic, wrap the new-system calls in a circuit breaker so that failures in the new service don't take down the facade and affect legacy traffic too)
- Complements: 12-Factor App (use together for — the "new system" being built should satisfy 12-factor constraints from the start, so that the migration produces a more operable system, not just a newer one)
- Tensions: Event Sourcing (contradicts when — if the legacy system uses mutable state and the new system uses event sourcing, data synchronization between the two becomes extremely complex; you cannot simply share a database; a dedicated sync layer is required)
- Topic Packs: cicd, api-gateways