Skip to content

Corporate IT Fluency - Street-Level Ops

You have a change to deploy. Here is the minimum viable change request:

CHANGE REQUEST TEMPLATE (what CAB expects)
──────────────────────────────────────────
Title:          Increase PostgreSQL max_connections from 100 to 300
Change Type:    Normal
Risk Level:     Medium
Environment:    Production - payment-db-01
Requestor:      Your Name / Your Team
Scheduled:      2026-03-21 02:00-04:00 UTC (maintenance window)

DESCRIPTION:
  Connection pool exhaustion causing P2 incidents (INC-4421, INC-4428,
  INC-4435). Root cause confirmed in PRB-0891. This change increases
  the connection limit and adds PgBouncer for connection pooling.

IMPACT ASSESSMENT:
  - Services affected: payment-api, order-service
  - Downtime expected: ~5 minutes during PostgreSQL restart
  - Users affected: Internal API consumers (no customer-facing impact)

IMPLEMENTATION PLAN:
  1. Notify service owners via #payments-oncall
  2. Deploy PgBouncer config via Ansible (pre-tested in staging)
  3. Update postgresql.conf max_connections=300
  4. Restart PostgreSQL (graceful drain first)
  5. Verify connections via: SELECT count(*) FROM pg_stat_activity;
  6. Smoke test payment-api health endpoint

ROLLBACK PLAN:
  1. Revert postgresql.conf to max_connections=100
  2. Remove PgBouncer config
  3. Restart PostgreSQL
  4. Estimated rollback time: 10 minutes

TEST EVIDENCE:
  - Staging test: 2026-03-18, no issues (link to results)
  - Load test: 2026-03-19, 500 concurrent connections stable

CAB Presentation Tips

  • Keep it under 3 minutes. CAB reviews 10-20 changes per session.
  • Lead with risk and impact, not technical details. The board cares about "what breaks if this goes wrong" more than "how the connection pooler works."
  • Have your rollback plan ready. "We'll roll back" is not a rollback plan. Specific steps, estimated time, and who executes.
  • Link to evidence. Test results, staging runs, incident tickets that justify the change.

Writing a RACI That Actually Gets Used

Most RACIs die in a spreadsheet. Here is how to make one that works:

RACI: Database Migration to RDS
────────────────────────────────

                        DBA     Dev Lead    Security    Infra     VP Eng
                        ───     ────────    ────────    ─────     ──────
Capacity planning       R       C           I           R         I
Schema migration script R       R           I           I         I
Security review         C       C           R           C         A
Data migration execute  R       I           I           C         I
Validation testing      R       R           C           I         I
DNS cutover             C       I           I           R         I
Go/no-go decision       C       C           C           C         A
Rollback (if needed)    R       I           I           R         I
Postmortem              R       R           C           R         I

KEY RULES:
  - Exactly ONE "A" per row (if nobody is accountable, nobody is)
  - "R" does the work; "A" owns the outcome
  - "C" means you ask them BEFORE acting
  - "I" means you tell them AFTER
  - If a row has no "R", nobody is doing the work
  - If you're "C" and nobody consulted you, escalate

Gotcha: Reading the Room in Corporate Meetings

Signals That a Decision Has Already Been Made

  • The meeting invite says "discuss" but the slide deck says "announcing"
  • The most senior person speaks first and states a position
  • Someone says "I think we're all aligned" before anyone has spoken
  • Action items were pre-written before the meeting started

What to do: If you disagree, say so clearly and concisely during the meeting. After the meeting, the decision is final unless you escalate formally. "I didn't get a chance to say anything" is not a valid objection after the fact.

Signals That Nobody Has Decided

  • Meeting ends with "let's circle back" and no action items
  • Multiple people claim to be the decision-maker
  • The same topic appears in three consecutive weekly meetings
  • Someone suggests "forming a working group"

What to do: Ask directly: "Who is the decision-maker for this? What is the deadline?" If nobody can answer, escalate to your manager that the decision is blocked.


Translating Tech Speak to Business Speak

When you need to communicate with non-technical stakeholders, translate:

You Think You Say
"The server crashed" "Service X experienced an unplanned outage affecting Y users for Z minutes"
"We need more servers" "Current capacity supports N requests/sec; projected growth requires M by Q3. Cost: $X/month"
"The code is bad" "Technical debt in the payment module is increasing incident frequency — 3 P2s this quarter vs 1 last quarter"
"We should use Kubernetes" "Container orchestration would reduce deployment time from 2 hours to 10 minutes and improve service reliability"
"I need root access" "I need elevated access to the production database to resolve the active P1 incident. Access will be revoked after resolution per policy"
"This deadline is unrealistic" "To deliver by that date we'd need to cut X and Y from scope. With full scope, the earliest realistic date is Z"

The ROI Pitch Template

When you need budget approval for a technical project:

PROJECT: Implement centralized log aggregation (ELK stack)

CURRENT STATE (cost of doing nothing):
  - Engineers spend ~5h/week per team SSH-ing into servers to grep logs
  - 3 teams × 5h/week × 52 weeks × $75/hr = $58,500/year in lost productivity
  - Average incident MTTR: 45 minutes (30% spent finding logs)

PROPOSED STATE:
  - Centralized search: log queries in seconds, not minutes
  - Estimated MTTR reduction: 30%  saves ~13 min per incident
  - 200 incidents/year × 13 min × $150/hr (incident cost) = $6,500/year

COST:
  - Infrastructure: $1,200/month ($14,400/year)
  - Implementation: 3 weeks engineering time (~$12,000)
  - First-year total: $26,400

ROI:
  - Annual savings: $58,500 + $6,500 = $65,000
  - First-year net: $65,000 - $26,400 = $38,600
  - Payback period: ~5 months
  - Ongoing annual net: $65,000 - $14,400 = $50,600

Compliance Encounters: What You Actually Do

When the SOC2 Auditor Asks You Questions

SOC2 audits happen annually. An auditor will interview engineers. Common questions and what they are really checking:

They Ask They Are Checking Good Answer Includes
"How do you deploy to production?" Change management controls exist "We use CI/CD with required code review, automated tests, and CAB approval for production changes"
"Who has access to production databases?" Access is restricted and reviewed "Access is role-based via IAM. We review access quarterly. Here's the last review date"
"How do you handle security vulnerabilities?" Vulnerability management process exists "We run weekly Trivy scans, triage findings in our Tuesday meeting, and patch critical CVEs within 72 hours"
"What happens when an employee leaves?" Offboarding removes access "HR triggers a Jira ticket that revokes all access within 24 hours. Here's the checklist"
"How are secrets managed?" Secrets are not hardcoded "All secrets are in HashiCorp Vault with automatic rotation. No secrets in code repos — we scan for this in CI"

What "a Finding" Means

  • Critical finding: Something is broken and must be fixed before the audit period ends. Example: "No access reviews have been performed in 12 months."
  • High finding: A significant gap that needs a remediation plan with a deadline. Example: "Production deployments do not require approval."
  • Observation: Not a formal finding but a recommendation. Example: "Consider implementing MFA for VPN access."

Your job as an engineer: answer honestly, point to documentation, and do not guess. "I'll need to check on that and get back to you" is always better than making something up.


Enterprise Architecture Phrases You'll Hear

Phrase Translation
"Target state architecture" What we want the system to look like in 1-2 years
"Current state" vs "future state" How it works now vs how it should work
"Rationalization" Cutting duplicate tools/systems. "We have 4 monitoring tools; let's pick one"
"North star" The long-term guiding vision. Usually a slide with boxes and arrows
"Technical debt remediation" Fixing old shortcuts that now cause problems
"Lift and shift" Moving to cloud by just copying VMs as-is (no redesign)
"Cloud-native refactor" Actually redesigning the app for cloud (containers, managed services)
"Strangler fig pattern" Gradually replacing an old system piece by piece
"Buy vs build" Should we purchase a product or write our own?
"Proof of concept (POC)" Small test to validate an idea. Sometimes genuine, sometimes a stalling tactic
"Minimum viable product (MVP)" Smallest useful version of a thing
"Guardrails" Automated policies that prevent bad things (e.g., "no public S3 buckets")