Postmortem: Wildcard Ingress Rule Nearly Exposes Internal Admin Panel¶
| Field | Value |
|---|---|
| ID | PM-027 |
| Date | 2025-05-08 |
| Severity | Near-Miss |
| Duration | 0m (no customer impact) |
| Time to Detect | 94m |
| Time to Mitigate | 22m |
| Customer Impact | None |
| Revenue Impact | None |
| Teams Involved | Platform Engineering, QA, Security, Backend Engineering |
| Postmortem Author | Theodora Blum |
| Postmortem Date | 2025-05-12 |
Executive Summary¶
On 2025-05-08 at 10:23 UTC, a Kubernetes Ingress resource with a wildcard host: "*" field was deployed to the staging cluster by a Platform Engineering engineer. The wildcard host matched all HTTP requests that did not match a more specific ingress rule, routing them to the internal admin panel service — a service that relies entirely on ingress-layer hostname restriction for access control and has no independent authentication. The misconfiguration was in staging only and was caught 94 minutes after deployment when QA engineer Fatima Osei accidentally navigated to the admin panel while mistyping a staging URL. The ingress was corrected and redeployed in 22 minutes. If the change had been promoted to production in the next scheduled release (T-minus 6 hours), the internal admin panel would have been reachable from any hostname with no authentication required.
Timeline (All times UTC)¶
| Time | Event |
|---|---|
| 09:45 | Platform Engineering engineer Callum Reed deploys new ingress resource ingress-frontend-v2 to staging namespace via kubectl apply; intends to add a hostname for a new marketing subdomain |
| 09:47 | Callum verifies the new subdomain routes correctly; does not test other routes or review ingress precedence |
| 10:07 | Staging smoke test suite runs; passes all checks (smoke tests only verify the happy path for each named service, not catch-all behavior) |
| 10:23 | QA engineer Fatima Osei is testing a checkout flow on checkout-stg.cardinal-systems.io; mistypes URL as checkout-stg.cardinal-systems.io/adm1n |
| 10:24 | Fatima receives the internal admin panel login-bypass page (no login prompt — the admin UI renders directly; it assumes all traffic has been pre-authenticated at the ingress layer) |
| 10:26 | Fatima screenshots the page, opens Slack thread in #qa-bugs: "Why am I seeing the admin panel at a URL I just made up?" |
| 10:29 | Platform Engineering on-call (Yuki Hashimoto) joins the thread; immediately recognizes the ingress catch-all risk |
| 10:31 | Yuki runs kubectl get ingress -n staging and identifies ingress-frontend-v2 with host: "*" |
| 10:33 | Yuki notifies #sec-incidents; Security lead Theodora Blum joins |
| 10:39 | Callum fixes the ingress spec: replaces host: "*" with host: "marketing-stg.cardinal-systems.io" |
| 10:51 | Corrected ingress applied; Yuki and Fatima confirm admin panel is no longer reachable via arbitrary hostnames |
| 11:10 | Security confirms no external traffic reached the admin panel during the 94-minute window by reviewing staging load balancer access logs |
| 11:30 | Production release (T-6h) held pending completion of this postmortem and addition of ingress validation to CI pipeline |
| 17:00 | Release proceeds after PM-027-01 validation gate is merged to CI |
Impact¶
Customer Impact¶
None — the misconfiguration was in the staging cluster only and was caught before any production promotion.
Internal Impact¶
- Callum Reed (Platform Engineering): ~1 hour (fix, testing, postmortem participation)
- Yuki Hashimoto (on-call): ~1.5 hours (investigation, coordination)
- Fatima Osei (QA): ~30 minutes (reporting, verification)
- Theodora Blum (Security): ~2 hours (access log audit, postmortem authorship)
- Production release delayed by approximately 5.5 hours
- Total: approximately 5 engineering-hours
Data Impact¶
None. Access logs confirm no external requests reached the admin panel during the window. The staging load balancer receives only internal QA and engineering traffic.
What Would Have Happened¶
If the wildcard ingress had been promoted to the production cluster in the 17:00 UTC release as originally scheduled, the ingress-admin-internal service would have been reachable via any hostname that did not match a more specific ingress rule. In practice, this means any HTTP request to the production load balancer IP with an arbitrary or misspelled Host: header would have been routed to the admin panel — a surface trivially discoverable by subdomain enumeration tools or even accidental user navigation.
The admin panel (admin.internal.cardinal-systems.io, internal-only in normal operation) provides unrestricted access to: user account management (create, delete, role assignment); feature flag controls affecting all 340,000 active users; database admin UI (read access to all tables via pgAdmin-style interface); and internal API key management. Critically, the admin panel was built with the assumption that all traffic reaching it had already been authenticated at the ingress layer. There is no secondary login prompt, no CSRF protection, and no rate limiting. An attacker reaching the panel from the internet would have had full administrative capability without any credentials.
Cardinal Systems processes payment card data under PCI DSS scope. Unauthorized access to the database admin UI — which includes raw access to the orders and payment_methods tables — would have constituted a reportable PCI DSS breach. The likely attacker scenario is automated: subdomain permutation scanners regularly probe load balancer IPs with wordlist-based Host: headers. Cardinal Systems' production load balancer is indexed in Shodan; exposure would likely have been discovered within hours of the production release.
Root Cause¶
What Happened (Technical)¶
Callum was adding an ingress rule for a new marketing subdomain (marketing-stg.cardinal-systems.io). He created a new Ingress resource rather than patching the existing one, working from a template that used host: "*" as a placeholder. He replaced the placeholder in the rules[0].host field but the template also had a second rule block — a catch-all — that he did not remove. The resulting manifest had:
rules:
- host: "marketing-stg.cardinal-systems.io"
http:
paths:
- path: /
backend:
service:
name: frontend-marketing
- host: "*"
http:
paths:
- path: /
backend:
service:
name: admin-internal
The second rule was inherited verbatim from the template, which had used it as an example of catch-all routing. In nginx-ingress, a host: "*" rule is functionally equivalent to no host restriction — it matches any Host: header value not claimed by a more specific rule. Because admin-internal happened to be the backend in the template's example catch-all, it became the unintended backend for all unmatched traffic.
Kubernetes does not reject or warn on wildcard host ingress rules at admission time (with default configuration). The kubectl apply succeeded cleanly. The ingress controller accepted and activated the rule within seconds.
Contributing Factors¶
- Catch-all rule present in the ingress template: The template used for new ingress resources contained a
host: "*"example rule as documentation scaffolding. Templates with executable but dangerous placeholder content are hazardous — engineers working under time pressure frequently apply them with incomplete edits. - Admin panel relies solely on ingress hostname restriction for auth: The admin service had no independent authentication layer. This is a single point of failure: any misconfiguration at the ingress level translates directly to unauthenticated admin access. Defense-in-depth requires the admin panel to enforce its own authn regardless of where traffic originates.
- CI/CD pipeline lacked ingress manifest validation: No admission webhook or CI linting step checked for wildcard host values in ingress resources. The smoke test suite validated service-specific routes but did not test for unintended catch-all behavior.
What We Got Lucky About¶
- Fatima mistyped a URL. The discovery was entirely accidental. Fatima was not performing any security testing; she hit the admin panel because of a typo in her browser's address bar. If she had not made that specific typo — landing on a path the admin panel's router happened to render rather than returning a 404 — the misconfiguration would have gone undetected until the production promotion 6 hours later.
- The window was during business hours with low external traffic to staging. The staging load balancer is not publicly advertised and receives traffic only from the QA and engineering networks during business hours. If the wildcard ingress had persisted overnight or been deployed to production first, the exposure window under realistic external traffic would have been far longer before anyone noticed.
Detection¶
How We Detected¶
Accidental discovery by QA engineer Fatima Osei during routine testing. Fatima mistyped a staging URL, received an unexpected admin panel response, and recognized the anomaly as worth escalating.
Why This Almost Wasn't Caught¶
No automated check validates ingress manifests for wildcard host rules before or after deployment. The CI smoke test suite tests that each named service is reachable at its expected hostname — it does not test that unexpected hostnames do not reach sensitive backends. Kubernetes admission control in the staging cluster has no OPA/Kyverno policy to reject or flag host: "*" in ingress specs. Without Fatima's accidental discovery, the change would have passed all automated gates and been promoted to production.
Response¶
What Went Well¶
- Once reported in Slack, the on-call engineer identified the root cause (wildcard ingress rule) within 5 minutes by running a single
kubectl get ingresscommand — fast and effective. - The production release was correctly held pending resolution rather than proceeding on schedule. The 5.5-hour delay was the right trade-off given the severity of what was narrowly avoided.
What Could Have Gone Better¶
- Callum did not review the full rendered manifest before applying. A
kubectl diffor dry-run against the cluster would have highlighted the second rule block. This is a habitual gap, not a one-time mistake. - The team had no ingress-specific validation in CI. The gap was known (a backlog item existed) but had not been prioritized. This near-miss is the forcing function it needed.
Action Items¶
| ID | Action | Priority | Owner | Status | Due Date |
|---|---|---|---|---|---|
| PM027-01 | Add OPA/Kyverno admission policy to staging and production clusters rejecting ingress resources with host: "*" |
P0 | Platform Engineering | Completed | 2025-05-08 |
| PM027-02 | Add independent authentication (OAuth2 proxy or mTLS) to the admin panel service; remove reliance on ingress hostname restriction as sole auth layer | P0 | Backend Engineering | In Progress | 2025-05-30 |
| PM027-03 | Remove catch-all host: "*" rule from all ingress templates; replace with commented documentation note explaining the pattern and why it is dangerous |
P1 | Callum Reed | Completed | 2025-05-09 |
| PM027-04 | Add CI linting step (conftest or kubeval + custom rule) that fails the pipeline if any staged ingress manifest contains host: "*" |
P1 | Platform Engineering | In Progress | 2025-05-16 |
| PM027-05 | Extend smoke test suite to assert that a curated list of sensitive service backends (admin, internal API, database proxy) are NOT reachable via arbitrary host headers | P2 | QA | Open | 2025-05-23 |
Lessons Learned¶
- A single authentication layer is not defense-in-depth. When the admin panel's only access control is "the ingress will filter out bad traffic," any ingress misconfiguration becomes a full authentication bypass. Sensitive services must enforce their own authn independent of the network layer that delivers traffic to them.
- Templates with runnable placeholder content are accidents waiting to happen. A template that contains
host: "*"as an example will eventually be applied before the engineer removes it. Templates should use values that fail loudly (e.g.,host: "REPLACE_ME.example.com") rather than values that silently do something dangerous. - Negative-path testing is as important as positive-path testing. The smoke suite verified that correct routes work. It did not verify that incorrect routes fail. For security-sensitive services, explicitly testing that they are unreachable from unexpected paths is a necessary part of the test plan.
Cross-References¶
- Failure Pattern: Misconfigured Network Policy / Catch-All Route Exposure
- Topic Packs: Kubernetes Ingress, Admission Control, Zero-Trust Network Architecture, Defense in Depth
- Runbook: K8S-RB-014 — Ingress Misconfiguration Response; SEC-RB-011 — Unintended Service Exposure
- Decision Tree: Security Triage → Unintended Service Exposure → Is it in prod? → No → Patch ingress, audit access logs, evaluate prod promotion gate