Skip to content

Postmortem: Wildcard Ingress Rule Nearly Exposes Internal Admin Panel

Field Value
ID PM-027
Date 2025-05-08
Severity Near-Miss
Duration 0m (no customer impact)
Time to Detect 94m
Time to Mitigate 22m
Customer Impact None
Revenue Impact None
Teams Involved Platform Engineering, QA, Security, Backend Engineering
Postmortem Author Theodora Blum
Postmortem Date 2025-05-12

Executive Summary

On 2025-05-08 at 10:23 UTC, a Kubernetes Ingress resource with a wildcard host: "*" field was deployed to the staging cluster by a Platform Engineering engineer. The wildcard host matched all HTTP requests that did not match a more specific ingress rule, routing them to the internal admin panel service — a service that relies entirely on ingress-layer hostname restriction for access control and has no independent authentication. The misconfiguration was in staging only and was caught 94 minutes after deployment when QA engineer Fatima Osei accidentally navigated to the admin panel while mistyping a staging URL. The ingress was corrected and redeployed in 22 minutes. If the change had been promoted to production in the next scheduled release (T-minus 6 hours), the internal admin panel would have been reachable from any hostname with no authentication required.

Timeline (All times UTC)

Time Event
09:45 Platform Engineering engineer Callum Reed deploys new ingress resource ingress-frontend-v2 to staging namespace via kubectl apply; intends to add a hostname for a new marketing subdomain
09:47 Callum verifies the new subdomain routes correctly; does not test other routes or review ingress precedence
10:07 Staging smoke test suite runs; passes all checks (smoke tests only verify the happy path for each named service, not catch-all behavior)
10:23 QA engineer Fatima Osei is testing a checkout flow on checkout-stg.cardinal-systems.io; mistypes URL as checkout-stg.cardinal-systems.io/adm1n
10:24 Fatima receives the internal admin panel login-bypass page (no login prompt — the admin UI renders directly; it assumes all traffic has been pre-authenticated at the ingress layer)
10:26 Fatima screenshots the page, opens Slack thread in #qa-bugs: "Why am I seeing the admin panel at a URL I just made up?"
10:29 Platform Engineering on-call (Yuki Hashimoto) joins the thread; immediately recognizes the ingress catch-all risk
10:31 Yuki runs kubectl get ingress -n staging and identifies ingress-frontend-v2 with host: "*"
10:33 Yuki notifies #sec-incidents; Security lead Theodora Blum joins
10:39 Callum fixes the ingress spec: replaces host: "*" with host: "marketing-stg.cardinal-systems.io"
10:51 Corrected ingress applied; Yuki and Fatima confirm admin panel is no longer reachable via arbitrary hostnames
11:10 Security confirms no external traffic reached the admin panel during the 94-minute window by reviewing staging load balancer access logs
11:30 Production release (T-6h) held pending completion of this postmortem and addition of ingress validation to CI pipeline
17:00 Release proceeds after PM-027-01 validation gate is merged to CI

Impact

Customer Impact

None — the misconfiguration was in the staging cluster only and was caught before any production promotion.

Internal Impact

  • Callum Reed (Platform Engineering): ~1 hour (fix, testing, postmortem participation)
  • Yuki Hashimoto (on-call): ~1.5 hours (investigation, coordination)
  • Fatima Osei (QA): ~30 minutes (reporting, verification)
  • Theodora Blum (Security): ~2 hours (access log audit, postmortem authorship)
  • Production release delayed by approximately 5.5 hours
  • Total: approximately 5 engineering-hours

Data Impact

None. Access logs confirm no external requests reached the admin panel during the window. The staging load balancer receives only internal QA and engineering traffic.

What Would Have Happened

If the wildcard ingress had been promoted to the production cluster in the 17:00 UTC release as originally scheduled, the ingress-admin-internal service would have been reachable via any hostname that did not match a more specific ingress rule. In practice, this means any HTTP request to the production load balancer IP with an arbitrary or misspelled Host: header would have been routed to the admin panel — a surface trivially discoverable by subdomain enumeration tools or even accidental user navigation.

The admin panel (admin.internal.cardinal-systems.io, internal-only in normal operation) provides unrestricted access to: user account management (create, delete, role assignment); feature flag controls affecting all 340,000 active users; database admin UI (read access to all tables via pgAdmin-style interface); and internal API key management. Critically, the admin panel was built with the assumption that all traffic reaching it had already been authenticated at the ingress layer. There is no secondary login prompt, no CSRF protection, and no rate limiting. An attacker reaching the panel from the internet would have had full administrative capability without any credentials.

Cardinal Systems processes payment card data under PCI DSS scope. Unauthorized access to the database admin UI — which includes raw access to the orders and payment_methods tables — would have constituted a reportable PCI DSS breach. The likely attacker scenario is automated: subdomain permutation scanners regularly probe load balancer IPs with wordlist-based Host: headers. Cardinal Systems' production load balancer is indexed in Shodan; exposure would likely have been discovered within hours of the production release.

Root Cause

What Happened (Technical)

Callum was adding an ingress rule for a new marketing subdomain (marketing-stg.cardinal-systems.io). He created a new Ingress resource rather than patching the existing one, working from a template that used host: "*" as a placeholder. He replaced the placeholder in the rules[0].host field but the template also had a second rule block — a catch-all — that he did not remove. The resulting manifest had:

rules:
  - host: "marketing-stg.cardinal-systems.io"
    http:
      paths:
        - path: /
          backend:
            service:
              name: frontend-marketing
  - host: "*"
    http:
      paths:
        - path: /
          backend:
            service:
              name: admin-internal

The second rule was inherited verbatim from the template, which had used it as an example of catch-all routing. In nginx-ingress, a host: "*" rule is functionally equivalent to no host restriction — it matches any Host: header value not claimed by a more specific rule. Because admin-internal happened to be the backend in the template's example catch-all, it became the unintended backend for all unmatched traffic.

Kubernetes does not reject or warn on wildcard host ingress rules at admission time (with default configuration). The kubectl apply succeeded cleanly. The ingress controller accepted and activated the rule within seconds.

Contributing Factors

  1. Catch-all rule present in the ingress template: The template used for new ingress resources contained a host: "*" example rule as documentation scaffolding. Templates with executable but dangerous placeholder content are hazardous — engineers working under time pressure frequently apply them with incomplete edits.
  2. Admin panel relies solely on ingress hostname restriction for auth: The admin service had no independent authentication layer. This is a single point of failure: any misconfiguration at the ingress level translates directly to unauthenticated admin access. Defense-in-depth requires the admin panel to enforce its own authn regardless of where traffic originates.
  3. CI/CD pipeline lacked ingress manifest validation: No admission webhook or CI linting step checked for wildcard host values in ingress resources. The smoke test suite validated service-specific routes but did not test for unintended catch-all behavior.

What We Got Lucky About

  1. Fatima mistyped a URL. The discovery was entirely accidental. Fatima was not performing any security testing; she hit the admin panel because of a typo in her browser's address bar. If she had not made that specific typo — landing on a path the admin panel's router happened to render rather than returning a 404 — the misconfiguration would have gone undetected until the production promotion 6 hours later.
  2. The window was during business hours with low external traffic to staging. The staging load balancer is not publicly advertised and receives traffic only from the QA and engineering networks during business hours. If the wildcard ingress had persisted overnight or been deployed to production first, the exposure window under realistic external traffic would have been far longer before anyone noticed.

Detection

How We Detected

Accidental discovery by QA engineer Fatima Osei during routine testing. Fatima mistyped a staging URL, received an unexpected admin panel response, and recognized the anomaly as worth escalating.

Why This Almost Wasn't Caught

No automated check validates ingress manifests for wildcard host rules before or after deployment. The CI smoke test suite tests that each named service is reachable at its expected hostname — it does not test that unexpected hostnames do not reach sensitive backends. Kubernetes admission control in the staging cluster has no OPA/Kyverno policy to reject or flag host: "*" in ingress specs. Without Fatima's accidental discovery, the change would have passed all automated gates and been promoted to production.

Response

What Went Well

  1. Once reported in Slack, the on-call engineer identified the root cause (wildcard ingress rule) within 5 minutes by running a single kubectl get ingress command — fast and effective.
  2. The production release was correctly held pending resolution rather than proceeding on schedule. The 5.5-hour delay was the right trade-off given the severity of what was narrowly avoided.

What Could Have Gone Better

  1. Callum did not review the full rendered manifest before applying. A kubectl diff or dry-run against the cluster would have highlighted the second rule block. This is a habitual gap, not a one-time mistake.
  2. The team had no ingress-specific validation in CI. The gap was known (a backlog item existed) but had not been prioritized. This near-miss is the forcing function it needed.

Action Items

ID Action Priority Owner Status Due Date
PM027-01 Add OPA/Kyverno admission policy to staging and production clusters rejecting ingress resources with host: "*" P0 Platform Engineering Completed 2025-05-08
PM027-02 Add independent authentication (OAuth2 proxy or mTLS) to the admin panel service; remove reliance on ingress hostname restriction as sole auth layer P0 Backend Engineering In Progress 2025-05-30
PM027-03 Remove catch-all host: "*" rule from all ingress templates; replace with commented documentation note explaining the pattern and why it is dangerous P1 Callum Reed Completed 2025-05-09
PM027-04 Add CI linting step (conftest or kubeval + custom rule) that fails the pipeline if any staged ingress manifest contains host: "*" P1 Platform Engineering In Progress 2025-05-16
PM027-05 Extend smoke test suite to assert that a curated list of sensitive service backends (admin, internal API, database proxy) are NOT reachable via arbitrary host headers P2 QA Open 2025-05-23

Lessons Learned

  1. A single authentication layer is not defense-in-depth. When the admin panel's only access control is "the ingress will filter out bad traffic," any ingress misconfiguration becomes a full authentication bypass. Sensitive services must enforce their own authn independent of the network layer that delivers traffic to them.
  2. Templates with runnable placeholder content are accidents waiting to happen. A template that contains host: "*" as an example will eventually be applied before the engineer removes it. Templates should use values that fail loudly (e.g., host: "REPLACE_ME.example.com") rather than values that silently do something dangerous.
  3. Negative-path testing is as important as positive-path testing. The smoke suite verified that correct routes work. It did not verify that incorrect routes fail. For security-sensitive services, explicitly testing that they are unreachable from unexpected paths is a necessary part of the test plan.

Cross-References

  • Failure Pattern: Misconfigured Network Policy / Catch-All Route Exposure
  • Topic Packs: Kubernetes Ingress, Admission Control, Zero-Trust Network Architecture, Defense in Depth
  • Runbook: K8S-RB-014 — Ingress Misconfiguration Response; SEC-RB-011 — Unintended Service Exposure
  • Decision Tree: Security Triage → Unintended Service Exposure → Is it in prod? → No → Patch ingress, audit access logs, evaluate prod promotion gate