Skip to content

Portal | Level: L1: Foundations | Topics: Cloud Deep Dive | Domain: Cloud

Cloud Operations Basics Drills

Remember: AWS IAM troubleshooting order: Role attached? -> Policy attached to role? -> Policy allows the action? -> Resource policy allows? -> SCP blocking? Both identity-based AND resource-based policies must allow the action. A deny in ANY policy wins over allows everywhere else.

Debug clue: When an AWS API call fails with AccessDenied, enable CloudTrail and search for the failed event. The errorCode and errorMessage fields tell you exactly which permission is missing. Without CloudTrail, you are guessing.

Drill 1: IAM Policy Analysis

Difficulty: Easy

Q: An EC2 instance can't read from S3. What's the first thing you check?

Answer
# 1. Check if the instance has an IAM role attached
aws ec2 describe-instances --instance-id i-123 \
  --query 'Reservations[0].Instances[0].IamInstanceProfile'

# 2. Check what the role allows
aws iam list-attached-role-policies --role-name MyRole
aws iam get-role-policy --role-name MyRole --policy-name MyPolicy

# 3. Check for bucket policy restrictions
aws s3api get-bucket-policy --bucket my-bucket

# 4. Test with the instance's credentials
aws sts get-caller-identity   # Run ON the instance
aws s3 ls s3://my-bucket/     # Test access
Checklist: Instance role → Role policies → Bucket policy → SCPs → VPC endpoint policy

Drill 2: VPC Connectivity Debugging

Difficulty: Medium

Q: A pod in EKS can't reach an RDS database. Walk through the troubleshooting steps.

Answer
# 1. Verify Security Groups
# RDS SG must allow inbound on port 5432 from the EKS node/pod SG
aws ec2 describe-security-groups --group-ids sg-rds-123

# 2. Verify subnet routing
# EKS and RDS must be in subnets that can route to each other
aws ec2 describe-route-tables --filters "Name=association.subnet-id,Values=subnet-123"

# 3. Check NACLs (often overlooked)
aws ec2 describe-network-acls --filters "Name=association.subnet-id,Values=subnet-123"

# 4. Verify DNS resolution
kubectl exec pod -- nslookup mydb.cluster-xxx.us-east-1.rds.amazonaws.com

# 5. Test connectivity
kubectl exec pod -- nc -zv mydb.cluster-xxx.us-east-1.rds.amazonaws.com 5432
Order: Security Groups → Route Tables → NACLs → DNS → Direct connectivity test

Drill 3: AWS CLI Essentials

Difficulty: Easy

Q: List all running EC2 instances with their name, instance ID, and private IP.

Answer
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].[Tags[?Key==`Name`].Value|[0],InstanceId,PrivateIpAddress]' \
  --output table

# Other essential queries:
# All EBS volumes not attached:
aws ec2 describe-volumes --filters "Name=status,Values=available" \
  --query 'Volumes[].[VolumeId,Size,CreateTime]' --output table

# S3 bucket sizes:
aws s3 ls --summarize --human-readable --recursive s3://my-bucket/

# Who am I?
aws sts get-caller-identity

Drill 4: GCP gcloud Essentials

Difficulty: Easy

Q: List all GKE clusters in the current project, then get credentials for one.

Answer
# List clusters
gcloud container clusters list

# Get credentials (updates kubeconfig)
gcloud container clusters get-credentials my-cluster \
  --zone us-central1-a \
  --project my-project

# Other essentials:
gcloud config set project my-project      # Switch project
gcloud config list                        # Show current config
gcloud compute instances list             # List VMs
gcloud iam service-accounts list          # List SAs
gcloud auth application-default login     # Auth for local dev

Drill 5: Cloud Storage Operations

Difficulty: Easy

Q: Sync a local directory to S3, excluding .git and node_modules.

Answer
# AWS S3
aws s3 sync ./dist/ s3://my-bucket/app/ \
  --exclude ".git/*" \
  --exclude "node_modules/*" \
  --delete    # Remove files from S3 not in local

# GCP GCS equivalent
gsutil -m rsync -r -d -x '.git|node_modules' ./dist/ gs://my-bucket/app/

# Useful S3 commands:
aws s3 cp file.tar.gz s3://bucket/backups/     # Single file
aws s3 presign s3://bucket/file.zip --expires 3600  # Signed URL (1hr)
aws s3 ls s3://bucket/ --recursive --summarize  # Total size

Drill 6: DNS and Route53

Difficulty: Medium

Q: An application's domain isn't resolving after you created a Route53 record. Debug it.

Answer
# 1. Verify the record exists
aws route53 list-resource-record-sets \
  --hosted-zone-id Z123 \
  --query "ResourceRecordSets[?Name=='app.example.com.']"

# 2. Check DNS propagation
dig app.example.com +short
dig app.example.com @8.8.8.8       # Query Google DNS
dig app.example.com +trace          # Full resolution path

# 3. Check if domain uses Route53 nameservers
dig NS example.com +short
# Compare with Route53 hosted zone NS records

# 4. TTL issues
dig app.example.com +noall +answer  # Check TTL value
# Old records may be cached at TTL from previous value
Common causes: - Domain nameservers don't point to Route53 - Record name missing trailing dot in API calls - CNAME at zone apex (not allowed — use alias record) - TTL caching of old/missing record

Drill 7: Cloud Networking — Subnets and CIDRs

Difficulty: Medium

Q: Design a VPC with public and private subnets across 3 AZs. What CIDRs would you use for a /16 VPC?

Answer
VPC: 10.0.0.0/16 (65,536 IPs)

Public subnets (/20 = 4,096 IPs each):
  AZ-a: 10.0.0.0/20    (10.0.0.1 – 10.0.15.254)
  AZ-b: 10.0.16.0/20   (10.0.16.1 – 10.0.31.254)
  AZ-c: 10.0.32.0/20   (10.0.32.1 – 10.0.47.254)

Private subnets (/18 = 16,384 IPs each):
  AZ-a: 10.0.64.0/18   (10.0.64.1 – 10.0.127.254)
  AZ-b: 10.0.128.0/18  (10.0.128.1 – 10.0.191.254)
  AZ-c: 10.0.192.0/18  (10.0.192.1 – 10.0.255.254)
Key points: - Public subnets: smaller (load balancers, NAT gateways) - Private subnets: larger (pods, instances, databases) - Leave room for growth (don't use all /16 space) - EKS needs at least 2 AZs, 3 is better - Each subnet reserves 5 IPs (AWS) for network/broadcast/DNS/future

Drill 8: Load Balancer Types

Difficulty: Medium

Q: When would you choose ALB vs NLB vs CLB on AWS? Give a concrete use case for each.

Answer | Type | Layer | Use Case | Key Feature | |------|-------|----------|-------------| | **ALB** | L7 (HTTP) | Web apps, microservices | Path/host routing, gRPC, WebSocket | | **NLB** | L4 (TCP/UDP) | High-perf, static IPs, non-HTTP | Millions of RPS, preserves source IP | | **CLB** | L4/L7 (legacy) | Don't use for new workloads | Being deprecated |
ALB examples:
- Route /api/* → backend service, /* → frontend
- Route api.example.com → service A, web.example.com → service B
- Terminate TLS, integrate with WAF

NLB examples:
- gRPC over HTTP/2 with static IPs
- Gaming servers needing UDP
- PrivateLink service endpoint
- EKS with aws-load-balancer-controller (type: nlb-ip)
In Kubernetes:
# ALB
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
# Uses Ingress + aws-load-balancer-controller

# NLB
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer

Drill 9: Cost Investigation

Difficulty: Medium

Q: Your AWS bill jumped 40% this month. How do you investigate?

Answer
# 1. Cost Explorer — filter by service
# AWS Console → Billing → Cost Explorer
# Group by: Service, then by Usage Type

# 2. CLI cost check (last 7 days by service)
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE

# 3. Check for resource sprawl
aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceType' | sort | uniq -c | sort -rn
aws ec2 describe-volumes --filters "Name=status,Values=available"  # Orphaned EBS
aws ec2 describe-addresses --query 'Addresses[?AssociationId==null]'  # Unused EIPs

# 4. Check data transfer
# Often the hidden cost — NAT Gateway, cross-AZ, internet egress
Top cost suspects: - NAT Gateway data processing ($0.045/GB) - Orphaned EBS volumes / snapshots - Idle/oversized EC2 instances - Cross-AZ data transfer - Forgotten dev/test environments - CloudWatch log ingestion

Drill 10: Multi-Account / Multi-Project

Difficulty: Hard

Q: You manage 5 AWS accounts (dev, staging, prod, shared-services, security). How do you switch between them efficiently?

Answer
# AWS: Named profiles + SSO
aws configure sso
# Creates profiles in ~/.aws/config

# Switch with:
export AWS_PROFILE=prod
aws sts get-caller-identity   # Verify

# Or per-command:
aws s3 ls --profile staging

# Better: use aws-vault for credential management
aws-vault exec prod -- aws s3 ls
aws-vault exec prod -- terraform plan

# GCP equivalent:
gcloud config configurations create prod
gcloud config configurations activate prod
gcloud config set project my-prod-project

# List all configurations:
gcloud config configurations list
Best practices: - Use SSO/federation, not long-lived keys - Use `aws-vault` or `granted` for credential management - Set `AWS_DEFAULT_REGION` in each profile - Use Terraform workspaces or separate state files per account - Never hardcode account IDs — use `data.aws_caller_identity`

Wiki Navigation

Prerequisites

  • AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
  • AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS EC2 (Topic Pack, L1) — Cloud Deep Dive
  • AWS ECS (Topic Pack, L2) — Cloud Deep Dive
  • AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS IAM (Topic Pack, L1) — Cloud Deep Dive
  • AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
  • AWS Networking (Topic Pack, L1) — Cloud Deep Dive
  • AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
  • AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive