Progressive Hints¶
Hint 1 (after 5 min)¶
Look at the two kubectl get configmap outputs. The same ConfigMap (feature-flags) has different values depending on which context you query. From the default context: enable_new_checkout: "true", max_cart_items: "50". From the read-replica context: enable_new_checkout: "false", max_cart_items: "25". The resourceVersion is also different (289102 vs 288847).
Hint 2 (after 10 min)¶
The etcd cluster currently looks healthy — leader elected, all members in sync at raft index 289341. But etcd_server_leader_changes_seen_total is 12, which is high, and etcd_server_proposals_failed_total is 847 — there were failed proposals. The log shows a leadership transfer at 03:22. There was a period of instability. Now compare the Terraform state: it records enable_new_checkout = "false" at resourceVersion: "288847". But the live ConfigMap has enable_new_checkout = "true" at resourceVersion: "289102". Someone manually changed the ConfigMap after Terraform applied it.
Hint 3 (after 15 min)¶
The full picture: this is a platform with feature flags managed in a Kubernetes ConfigMap. Terraform defines the ConfigMap with enable_new_checkout = "false". During a network partition or etcd instability (12 leader changes, 847 failed proposals), someone manually kubectl edit'd the ConfigMap — changing enable_new_checkout to true and max_cart_items to 50. The etcd cluster healed and currently shows consistent state, but:
1. The live ConfigMap no longer matches the Terraform state (drift)
2. The read-replica context may be hitting a stale API server cache or a different cluster
3. The application is serving the new checkout flow (v2_new) which was not supposed to be enabled yet
4. Next terraform apply will revert the manual change, potentially breaking the live checkout flow