Cloud Operations Basics Cheat Sheet¶
One-liner:
aws sts get-caller-identityis the cloud equivalent ofwhoami— run it first in every troubleshooting session to confirm which account, role, and user you are operating as. Most "permission denied" issues start with being in the wrong profile.
AWS CLI Essentials¶
# Identity
aws sts get-caller-identity
aws configure list # Show active profile
# EC2
aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" \
--query 'Reservations[].Instances[].[InstanceId,InstanceType,PrivateIpAddress,Tags[?Key==`Name`].Value|[0]]' \
--output table
# S3
aws s3 ls s3://bucket/prefix/
aws s3 sync ./dist s3://bucket/ --delete
aws s3 presign s3://bucket/file --expires-in 3600
# Logs
aws logs tail /aws/lambda/my-func --follow
aws logs filter-log-events --log-group-name /ecs/app --filter-pattern "ERROR"
GCP gcloud Essentials¶
gcloud config set project PROJECT_ID
gcloud config list
gcloud auth application-default login
gcloud compute instances list
gcloud container clusters list
gcloud container clusters get-credentials CLUSTER --zone ZONE
gcloud iam service-accounts list
gcloud logging read 'severity>=ERROR' --limit=50
IAM Quick Reference¶
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| User identity | IAM User | Google Account | AD User |
| Machine identity | IAM Role | Service Account | Managed Identity |
| Permission set | IAM Policy | IAM Role | RBAC Role |
| Permission boundary | Permission Boundary | Org Policy | Management Group |
| Temp credentials | STS AssumeRole | Workload Identity | Managed Identity token |
Networking Comparison¶
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Virtual network | VPC | VPC | VNet |
| Subnet | Subnet | Subnet | Subnet |
| Firewall rules | Security Groups | Firewall Rules | NSGs |
| NAT | NAT Gateway | Cloud NAT | NAT Gateway |
| Load balancer (L7) | ALB | HTTP(S) LB | App Gateway |
| Load balancer (L4) | NLB | TCP/UDP LB | Azure LB |
| DNS | Route 53 | Cloud DNS | Azure DNS |
| CDN | CloudFront | Cloud CDN | Azure CDN |
VPC Troubleshooting Flow¶
Can't connect?
├── DNS resolution works? → dig/nslookup
├── Security Group allows? → Check inbound rules on target
├── NACL allows? → Check both inbound AND outbound
├── Route table has route? → Check both subnets
├── NAT Gateway (if private → internet)?
├── VPC peering / Transit GW route?
└── Application listening on right port?
Cost Control Checklist¶
Daily:
[ ] Check Cost Explorer for anomalies
[ ] Review any budget alerts
Weekly:
[ ] Find orphaned resources (unattached EBS, unused EIPs)
[ ] Check for oversized instances (CPU < 10% avg)
[ ] Review data transfer charges
Monthly:
[ ] Right-size instances based on CloudWatch metrics
[ ] Evaluate Reserved Instance / Savings Plan coverage
[ ] Review and clean old snapshots and AMIs
[ ] Check for idle load balancers
Default trap: AWS default limits are surprisingly low for production use. The most common surprise: 5 EIPs per region and 5 VPCs per region. Lambda's default 1,000 concurrent executions can cause throttling during traffic spikes. Always request limit increases before you need them — increases take minutes but discovering you need one takes hours of debugging.
Common AWS Resource Limits¶
| Resource | Default Limit |
|---|---|
| VPCs per region | 5 |
| Subnets per VPC | 200 |
| Security Groups per VPC | 500 |
| Rules per SG | 60 inbound + 60 outbound |
| EIPs per region | 5 |
| EC2 instances (on-demand) | Varies by type |
| S3 buckets per account | 100 |
| Lambda concurrent executions | 1,000 |
Request increases via Service Quotas console.
Gotcha:
aws s3 sync --deletemirrors a local directory to S3, deleting remote files that do not exist locally. This is a destructive operation — if you accidentally run it from an empty directory, it wipes the bucket. Always do a dry run first:aws s3 sync ./dist s3://bucket/ --delete --dryrun.
Profile and Credential Management¶
# AWS SSO (recommended)
aws configure sso
export AWS_PROFILE=prod
aws sts get-caller-identity
# AWS per-command profile
aws s3 ls --profile staging
# aws-vault (secure credential storage)
aws-vault exec prod -- terraform plan
# GCP configurations
gcloud config configurations create prod
gcloud config configurations activate prod
gcloud config configurations list
Quick Debugging Commands¶
# AWS: Check why instance can't reach internet
aws ec2 describe-route-tables --filters "Name=association.subnet-id,Values=subnet-xxx"
aws ec2 describe-nat-gateways --filter "Name=state,Values=available"
aws ec2 describe-security-groups --group-ids sg-xxx
# AWS: Check EKS node status
aws eks describe-cluster --name my-cluster --query 'cluster.status'
aws ec2 describe-instances --filters "Name=tag:eks:cluster-name,Values=my-cluster" \
--query 'Reservations[].Instances[].[InstanceId,State.Name]'
# GCP: Check GKE node pool
gcloud container node-pools list --cluster my-cluster --zone us-central1-a