The Cloud Bill Surprise¶
Category: The Migration Domains: cloud-ops, finops Read time: ~5 min
Setting the Scene¶
I was the senior SRE at a healthcare analytics company. We ran everything in two colocated racks — 24 Dell PowerEdge R640s, a NetApp SAN, and a pair of Cisco Nexus switches. Our monthly hosting bill was about $18,000 including power, cooling, and the cage lease. The CFO wanted to "move to the cloud" because our datacenter contract was up for renewal and AWS promised "pay only for what you use."
We budgeted $22,000/month for AWS. We figured a little overhead for the flexibility. The first real bill was $87,000.
What Happened¶
Week 1-2 — We did a lift-and-shift. Every physical server became an EC2 instance. The R640s with 64GB RAM and 16 cores became m5.4xlarge instances. The NetApp volumes became EBS gp2 volumes. We used AWS DMS for the database migration and rsync for the flat files. It went smoothly. Too smoothly.
Week 3 — We turned on the production workload. Our analytics pipeline pulled 2TB of data from S3 every night, processed it, and pushed results to another S3 bucket. What we didn't model: cross-AZ data transfer. Our processing fleet was in us-east-1a, our S3 buckets were being accessed from us-east-1b and us-east-1c. That's $0.01/GB. Sounds cheap until you multiply by 2TB nightly.
Week 4 — The EBS bill arrived in the Cost Explorer preview. We had provisioned gp2 volumes sized to match our NetApp LUNs — 500GB each, 24 of them. But gp2 IOPS scale with volume size, and our database volumes needed 10,000 IOPS. We should have used io1 (now io2) with provisioned IOPS, but instead we'd oversized the volumes to get the IOPS, paying for 4TB of storage we didn't need.
Month 2 — The real bill landed. EC2: $31,000. EBS: $12,000. Data transfer: $18,000. S3 requests: $6,000. NAT Gateway: $8,000. CloudWatch: $4,000. Support plan: $5,000. Miscellaneous: $3,000. Total: $87,000. The CFO called an emergency meeting. I brought the AWS pricing calculator and a lot of apologetic energy.
Month 3-4 — We right-sized. Moved to Graviton m6g instances (20% cheaper). Replaced gp2 with gp3 (saved $4,000/month). Consolidated S3 access into a single AZ with VPC endpoints (eliminated NAT Gateway costs). Set up S3 lifecycle policies. Used Reserved Instances for the baseline fleet. Got the bill down to $38,000/month.
The Moment of Truth¶
Sitting in the CFO's office with a spreadsheet showing $87,000 where $22,000 was supposed to be. She wasn't angry — she was confused. "Where does the money go?" I pulled up the Cost Explorer breakdown and watched her expression change as she saw "Data Transfer" as a line item for the first time. Nobody in the planning phase had even considered egress charges. We'd modeled compute and storage. The network was "included." Except it wasn't.
The Aftermath¶
Six months in, we stabilized at $34,000/month — still almost double our colo cost, but with genuine elasticity benefits. The analytics pipeline could burst to 10x during quarter-end. We couldn't do that in colo without buying 10x the hardware. The CFO accepted the number once we showed the burst math. I became the unofficial FinOps person and set up weekly cost anomaly alerts in Slack.
The Lessons¶
- Model costs before migrating: Compute and storage are obvious. Data transfer, NAT Gateway, API requests, and CloudWatch are not. Model all of them with real traffic numbers, not estimates.
- Cloud-native architecture matters: Lift-and-shift preserves your on-prem cost structure and adds cloud overhead on top. You need to re-architect for the cloud pricing model.
- Egress charges are real: $0.09/GB out to the internet, $0.01/GB cross-AZ. At scale, data transfer can exceed compute costs. Use VPC endpoints, stay in-AZ, and compress everything.
What I'd Do Differently¶
I'd run a 2-week cost pilot with production-equivalent traffic before committing to the migration. I'd use the AWS Migration Evaluator (formerly TSO Logic) to model costs from actual utilization data. And I'd appoint a FinOps owner on day one — not after the first surprise bill. Cost monitoring should be in your Grafana dashboard right next to CPU and memory.
The Quote¶
"The cloud is not cheaper. The cloud is more flexible. Flexibility costs money if you don't manage it."
Cross-References¶
- Topic Packs: Cloud Ops Basics, FinOps, AWS EC2