Feature Flags Footguns¶

Mistakes that cause outages, stale experiences, flag evaluation failures, and permanent technical debt.

1. Defaulting to `True` for a new feature flag¶

You create new-checkout-v2 with a default value of True. The LaunchDarkly SDK fails to initialize (network issue, wrong key, outage). All variation() calls return the default. Every user gets the new, untested checkout flow. This is the opposite of what defaults are for.

Fix: Default values should always be the conservative, safe, backward-compatible option. New features default to False. Kill switches default to True (feature on). The default is your failure mode.

Remember: Think of the default value as "what happens when the flag system is completely down." If LaunchDarkly, Flagsmith, or your custom flag service is unreachable, the SDK returns the in-code default. For new features: False means "nothing changes if the system fails." For kill switches protecting existing functionality: True means "feature stays on even if the flag system dies."

2. Shipping code with a flag but no flag definition in the system¶

You deploy code that evaluates feature-x-enabled but forget to create the flag in LaunchDarkly (or Flagsmith). The SDK returns the default value everywhere. The flag never gets used. Or worse, you create the flag two weeks later with a different default and break behavior.

Fix: Create the flag in your flag system before deploying the code that references it. Add flag creation to your PR checklist or deployment pipeline.

3. Not testing both code paths¶

You write tests for the new feature path (flag = True). You never test the legacy path (flag = False). You flip to 100% and remove the old code. Six months later someone tries to roll back to the flag = False path for a different reason — the code path was broken and the tests never caught it.

Fix: For every flagged feature, write tests with the flag in both states. Use a test fixture or mock that controls flag evaluation.

@pytest.mark.parametrize("flag_value", [True, False])
def test_checkout_both_paths(flag_value, monkeypatch):
    monkeypatch.setattr("myapp.flags.get_flag", lambda key, ctx, default: flag_value)
    result = checkout(user_id="test-user")
    assert result["status"] in ["ok", "legacy-ok"]  # both paths must work

4. Stale flags living forever in the codebase¶

You release a feature via flag. It goes to 100%. You never remove the flag or the old code path. Six months later you have 40 flags in your code with unclear ownership, no one knows which are still active, and toggling any of them has unpredictable effects. The code is harder to read because every function has 3 conditional branches.

Fix: Set a cleanup date when you create the flag. Assign an owner. When a flag hits 100% and is stable for 1 week, create a ticket to remove it. Track stale flags with the LD Code References integration.

5. Evaluating flags inside tight loops¶

You have a function that processes 10,000 records in a batch. Inside the loop, you call client.variation("new-algorithm-v2", ctx, False) for every record. Each call may hit the network (if the SDK isn't properly caching). Your batch job goes from 2 seconds to 45 seconds.

Fix: Evaluate flags once outside the loop. Cache the result.

# BAD: evaluates flag 10,000 times
def process_records(records: list) -> list:
    return [process(r, use_new_algo=client.variation("new-algo", ctx, False)) for r in records]

# GOOD: evaluate once, use result
def process_records(records: list) -> list:
    use_new_algo = client.variation("new-algo", ctx, False)
    return [process(r, use_new_algo=use_new_algo) for r in records]

6. Using flag state in caching logic (cache poisoning)¶

You evaluate a flag, compute a result, and cache the result under a key. Later you change the flag value. Old results (from the previous flag state) are still in the cache. Users get stale behavior even though the flag changed.

Fix: Include the flag variation in your cache key, or invalidate the relevant cache keys when you change a flag.

# BAD: cache key doesn't include flag state
cache_key = f"search-results:{query}"

# GOOD: include flag variation in cache key
flag_variant = "v2" if client.variation("new-search", ctx, False) else "v1"
cache_key = f"search-results:{flag_variant}:{query}"

7. Flag targeting rules that use mutable user attributes¶

Your targeting rule: "if plan = 'premium' then return True." You store plan in the user's JWT. A user upgrades from free to premium. But their JWT still says free until it expires in 24 hours. The user paid for premium and can't access premium features.

Fix: Flag targeting attributes should come from authoritative, low-latency sources (session data refreshed on login, not long-lived JWTs). Or use short-lived tokens. Or target by user ID and maintain a list of premium user IDs in the flag targeting.

8. Not monitoring flag evaluation errors¶

The LaunchDarkly SDK connects to a region that's having issues. Some evaluations fail. The SDK returns default values. Users are unexpectedly getting old behavior. You have no alerting on this and don't find out for 4 hours.

Fix: Use an error hook to track evaluation failures. Alert on elevated evaluation error rates.

class ErrorTrackingHook(Hook):
    def error(self, hook_context, exception, hints):
        metrics.increment(
            "feature_flag.evaluation_error",
            tags={"flag": hook_context.flag_key},
        )
        logger.warning(
            "Flag evaluation error",
            extra={"flag": hook_context.flag_key, "error": str(exception)}
        )

# Alert: if feature_flag.evaluation_error rate > 1% over 5 minutes, page on-call

9. Percentage rollout with the wrong bucketing attribute¶

You set up a 10% rollout bucketed by session ID (or request ID, or UUID generated per request). Result: 10% of requests get the new feature — but a single user will see the new feature on some page loads and the old feature on others. This is maddening for users and makes A/B test metrics meaningless.

Fix: Always bucket percentage rollouts by a stable user identifier (user ID, device ID, account ID). LaunchDarkly uses the targeting_key (user key) by default — make sure you're setting it correctly.

ctx = Context.builder(user_id)  # user_id must be stable across sessions
# NOT: ctx = Context.builder(str(uuid.uuid4()))  # random per request = bad

10. Deleting a flag from the system before removing it from code¶

You archive the flag in LaunchDarkly. The code still calls client.variation("deleted-flag", ctx, False). LaunchDarkly's SDK returns the default value. This is actually fine — but the behavior is invisible. No one knows the flag is gone. If the default value is wrong, you have a silent bug.

Fix: Always remove flag references from code before deleting the flag from the system. Use the LD Code References tool to find all usages before archiving.

# Find all flag references in codebase
grep -r "deleted-flag" src/ --include="*.py" --include="*.go" --include="*.js"

# LD Code References (run regularly in CI)
ld-find-code-refs --accessToken=$LD_API_KEY --projKey=my-project --dir=.

11. Using feature flags for authorization/security gates¶

You use a feature flag to gate admin functionality: if flag("admin-panel-enabled"). You accidentally enable the flag for all users during a bulk update. Every user now has admin access. Feature flag systems are not designed as authorization systems — they have no audit log for individual user actions, no RBAC, and evaluation can fail to the default (which might be True).

Fix: Use your application's authorization system for security gates (roles, permissions, ACLs). Feature flags are for user experience control, not access control. Flags can complement authorization (e.g., "show the beta UI to authorized beta users") but never replace it.

12. Not having a process for flag review at scale¶

You have 200 flags across 15 services. No one has reviewed them in a year. 60 are fully rolled out to 100% and could be cleaned up. 20 are in permanent "off" state for features that were abandoned. 10 are owned by engineers who left. No one knows which flags are safe to change.

Fix: Implement quarterly flag audits. Use LaunchDarkly's API to export all flags with metadata. Report on flags by age, evaluation count, and current rollout percentage. Assign ownership and expiry dates at creation time.

# Generate a flag audit report
curl -s -H "Authorization: $LD_API_KEY" \
  "https://app.launchdarkly.com/api/v2/flags/my-project?limit=200" | \
  jq -r '[.items[] | {
    key: .key,
    created: (.creationDate / 1000 | strftime("%Y-%m-%d")),
    production_on: .environments.production.on,
    owner: (.maintainerId // "unassigned"),
    variation_count: (.variations | length)
  }] | (["key","created","production_on","owner"] | @csv), (.[] | [.key, .created, .production_on, .owner] | @csv)' > flag-audit.csv