Portal | Level: L1: Foundations | Topics: Cloud Deep Dive | Domain: Cloud
Cloud Operations Basics Drills¶
Remember: AWS IAM troubleshooting order: Role attached? -> Policy attached to role? -> Policy allows the action? -> Resource policy allows? -> SCP blocking? Both identity-based AND resource-based policies must allow the action. A deny in ANY policy wins over allows everywhere else.
Debug clue: When an AWS API call fails with
AccessDenied, enable CloudTrail and search for the failed event. TheerrorCodeanderrorMessagefields tell you exactly which permission is missing. Without CloudTrail, you are guessing.
Drill 1: IAM Policy Analysis¶
Difficulty: Easy
Q: An EC2 instance can't read from S3. What's the first thing you check?
Answer
# 1. Check if the instance has an IAM role attached
aws ec2 describe-instances --instance-id i-123 \
--query 'Reservations[0].Instances[0].IamInstanceProfile'
# 2. Check what the role allows
aws iam list-attached-role-policies --role-name MyRole
aws iam get-role-policy --role-name MyRole --policy-name MyPolicy
# 3. Check for bucket policy restrictions
aws s3api get-bucket-policy --bucket my-bucket
# 4. Test with the instance's credentials
aws sts get-caller-identity # Run ON the instance
aws s3 ls s3://my-bucket/ # Test access
Drill 2: VPC Connectivity Debugging¶
Difficulty: Medium
Q: A pod in EKS can't reach an RDS database. Walk through the troubleshooting steps.
Answer
# 1. Verify Security Groups
# RDS SG must allow inbound on port 5432 from the EKS node/pod SG
aws ec2 describe-security-groups --group-ids sg-rds-123
# 2. Verify subnet routing
# EKS and RDS must be in subnets that can route to each other
aws ec2 describe-route-tables --filters "Name=association.subnet-id,Values=subnet-123"
# 3. Check NACLs (often overlooked)
aws ec2 describe-network-acls --filters "Name=association.subnet-id,Values=subnet-123"
# 4. Verify DNS resolution
kubectl exec pod -- nslookup mydb.cluster-xxx.us-east-1.rds.amazonaws.com
# 5. Test connectivity
kubectl exec pod -- nc -zv mydb.cluster-xxx.us-east-1.rds.amazonaws.com 5432
Drill 3: AWS CLI Essentials¶
Difficulty: Easy
Q: List all running EC2 instances with their name, instance ID, and private IP.
Answer
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" \
--query 'Reservations[].Instances[].[Tags[?Key==`Name`].Value|[0],InstanceId,PrivateIpAddress]' \
--output table
# Other essential queries:
# All EBS volumes not attached:
aws ec2 describe-volumes --filters "Name=status,Values=available" \
--query 'Volumes[].[VolumeId,Size,CreateTime]' --output table
# S3 bucket sizes:
aws s3 ls --summarize --human-readable --recursive s3://my-bucket/
# Who am I?
aws sts get-caller-identity
Drill 4: GCP gcloud Essentials¶
Difficulty: Easy
Q: List all GKE clusters in the current project, then get credentials for one.
Answer
# List clusters
gcloud container clusters list
# Get credentials (updates kubeconfig)
gcloud container clusters get-credentials my-cluster \
--zone us-central1-a \
--project my-project
# Other essentials:
gcloud config set project my-project # Switch project
gcloud config list # Show current config
gcloud compute instances list # List VMs
gcloud iam service-accounts list # List SAs
gcloud auth application-default login # Auth for local dev
Drill 5: Cloud Storage Operations¶
Difficulty: Easy
Q: Sync a local directory to S3, excluding .git and node_modules.
Answer
# AWS S3
aws s3 sync ./dist/ s3://my-bucket/app/ \
--exclude ".git/*" \
--exclude "node_modules/*" \
--delete # Remove files from S3 not in local
# GCP GCS equivalent
gsutil -m rsync -r -d -x '.git|node_modules' ./dist/ gs://my-bucket/app/
# Useful S3 commands:
aws s3 cp file.tar.gz s3://bucket/backups/ # Single file
aws s3 presign s3://bucket/file.zip --expires 3600 # Signed URL (1hr)
aws s3 ls s3://bucket/ --recursive --summarize # Total size
Drill 6: DNS and Route53¶
Difficulty: Medium
Q: An application's domain isn't resolving after you created a Route53 record. Debug it.
Answer
# 1. Verify the record exists
aws route53 list-resource-record-sets \
--hosted-zone-id Z123 \
--query "ResourceRecordSets[?Name=='app.example.com.']"
# 2. Check DNS propagation
dig app.example.com +short
dig app.example.com @8.8.8.8 # Query Google DNS
dig app.example.com +trace # Full resolution path
# 3. Check if domain uses Route53 nameservers
dig NS example.com +short
# Compare with Route53 hosted zone NS records
# 4. TTL issues
dig app.example.com +noall +answer # Check TTL value
# Old records may be cached at TTL from previous value
Drill 7: Cloud Networking — Subnets and CIDRs¶
Difficulty: Medium
Q: Design a VPC with public and private subnets across 3 AZs. What CIDRs would you use for a /16 VPC?
Answer
VPC: 10.0.0.0/16 (65,536 IPs)
Public subnets (/20 = 4,096 IPs each):
AZ-a: 10.0.0.0/20 (10.0.0.1 – 10.0.15.254)
AZ-b: 10.0.16.0/20 (10.0.16.1 – 10.0.31.254)
AZ-c: 10.0.32.0/20 (10.0.32.1 – 10.0.47.254)
Private subnets (/18 = 16,384 IPs each):
AZ-a: 10.0.64.0/18 (10.0.64.1 – 10.0.127.254)
AZ-b: 10.0.128.0/18 (10.0.128.1 – 10.0.191.254)
AZ-c: 10.0.192.0/18 (10.0.192.1 – 10.0.255.254)
Drill 8: Load Balancer Types¶
Difficulty: Medium
Q: When would you choose ALB vs NLB vs CLB on AWS? Give a concrete use case for each.
Answer
| Type | Layer | Use Case | Key Feature | |------|-------|----------|-------------| | **ALB** | L7 (HTTP) | Web apps, microservices | Path/host routing, gRPC, WebSocket | | **NLB** | L4 (TCP/UDP) | High-perf, static IPs, non-HTTP | Millions of RPS, preserves source IP | | **CLB** | L4/L7 (legacy) | Don't use for new workloads | Being deprecated |ALB examples:
- Route /api/* → backend service, /* → frontend
- Route api.example.com → service A, web.example.com → service B
- Terminate TLS, integrate with WAF
NLB examples:
- gRPC over HTTP/2 with static IPs
- Gaming servers needing UDP
- PrivateLink service endpoint
- EKS with aws-load-balancer-controller (type: nlb-ip)
Drill 9: Cost Investigation¶
Difficulty: Medium
Q: Your AWS bill jumped 40% this month. How do you investigate?
Answer
# 1. Cost Explorer — filter by service
# AWS Console → Billing → Cost Explorer
# Group by: Service, then by Usage Type
# 2. CLI cost check (last 7 days by service)
aws ce get-cost-and-usage \
--time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
--granularity DAILY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE
# 3. Check for resource sprawl
aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceType' | sort | uniq -c | sort -rn
aws ec2 describe-volumes --filters "Name=status,Values=available" # Orphaned EBS
aws ec2 describe-addresses --query 'Addresses[?AssociationId==null]' # Unused EIPs
# 4. Check data transfer
# Often the hidden cost — NAT Gateway, cross-AZ, internet egress
Drill 10: Multi-Account / Multi-Project¶
Difficulty: Hard
Q: You manage 5 AWS accounts (dev, staging, prod, shared-services, security). How do you switch between them efficiently?
Answer
# AWS: Named profiles + SSO
aws configure sso
# Creates profiles in ~/.aws/config
# Switch with:
export AWS_PROFILE=prod
aws sts get-caller-identity # Verify
# Or per-command:
aws s3 ls --profile staging
# Better: use aws-vault for credential management
aws-vault exec prod -- aws s3 ls
aws-vault exec prod -- terraform plan
# GCP equivalent:
gcloud config configurations create prod
gcloud config configurations activate prod
gcloud config set project my-prod-project
# List all configurations:
gcloud config configurations list
Wiki Navigation¶
Prerequisites¶
- Cloud Ops Basics (Topic Pack, L1)
Related Content¶
- AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
- AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS EC2 (Topic Pack, L1) — Cloud Deep Dive
- AWS ECS (Topic Pack, L2) — Cloud Deep Dive
- AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS IAM (Topic Pack, L1) — Cloud Deep Dive
- AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
- AWS Networking (Topic Pack, L1) — Cloud Deep Dive
- AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
- AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive