Skip to content

Cloud Provider Deep-Dive Cheat Sheet

Under the hood: IRSA (AWS) and Workload Identity (GCP) both work by projecting a signed service account token into the pod. The cloud IAM service trusts the cluster's OIDC issuer, verifies the token's audience and subject claims, and issues short-lived cloud credentials. No long-lived keys are stored anywhere — credentials rotate automatically every ~12 hours.

IAM / Identity Federation

AWS EKS: IRSA (IAM Roles for Service Accounts)

# 1. Annotate ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-reader
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123:role/s3-reader
# 2. Create IAM role with OIDC trust
aws iam create-role --role-name s3-reader \
  --assume-role-policy-document file://trust-policy.json

# 3. Attach permissions
aws iam attach-role-policy --role-name s3-reader \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

GCP GKE: Workload Identity

apiVersion: v1
kind: ServiceAccount
metadata:
  name: gcs-reader
  annotations:
    iam.gke.io/gcp-service-account: gcs-reader@project.iam.gserviceaccount.com
gcloud iam service-accounts add-iam-policy-binding \
  gcs-reader@project.iam.gserviceaccount.com \
  --member="serviceAccount:project.svc.id.goog[namespace/gcs-reader]" \
  --role="roles/iam.workloadIdentityUser"

Load Balancers

Type Layer Use Case Static IP Cost
ALB (AWS) 7 HTTP/HTTPS, path routing No $$
NLB (AWS) 4 TCP/UDP, gRPC, low latency Yes $
CLB (AWS) 4/7 Legacy, avoid for new No $
Google LB 7 HTTP(S), global Yes (anycast) $$
# AWS NLB via Service
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer

# AWS ALB via Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb  # preferred over kubernetes.io/ingress.class annotation

VPC / Networking

Gotcha: AWS ALB does not support static IP addresses — use NLB if you need a fixed IP (e.g., for DNS A records or firewall whitelisting). ALB IPs change as AWS scales the load balancer. The workaround is to put a Global Accelerator in front of the ALB.

Subnet IP Exhaustion (EKS)

# Check available IPs
aws ec2 describe-subnets --subnet-ids subnet-xxx \
  --query 'Subnets[].AvailableIpAddressCount'

# EKS VPC CNI: IPs per node = (max ENIs) × (IPs per ENI) - 1
# m5.large: 3 ENIs × 10 IPs = 29 pod IPs

# Fix: Enable prefix delegation (16 IPs per slot instead of 1)
kubectl set env daemonset aws-node -n kube-system \
  ENABLE_PREFIX_DELEGATION=true

# Or: Add secondary CIDR
aws ec2 associate-vpc-cidr-block --vpc-id vpc-xxx \
  --cidr-block 100.64.0.0/16

Security Groups for Pods (EKS)

apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: db-access
spec:
  podSelector:
    matchLabels:
      role: backend
  securityGroups:
    groupIds:
    - sg-xxx  # Allow DB access

Storage

AWS GCP Use Case IOPS
gp3 pd-balanced General purpose 3,000-16,000
io2 pd-ssd Databases Up to 256,000
st1 pd-standard Throughput Low
Instance store Local SSD Temp/cache Very high
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "5000"
  throughput: "250"
allowVolumeExpansion: true
reclaimPolicy: Retain

Default trap: EKS VPC CNI assigns one IP per pod from the node's subnet. An m5.large supports only 29 pod IPs — you can exhaust subnet IPs quickly with many small pods. Enable prefix delegation (ENABLE_PREFIX_DELEGATION=true) to get 16 IPs per ENI slot instead of 1, dramatically increasing pod density.

Node Groups / Instance Strategy

System workloads    → On-demand, m5.large, tainted
Web/API (stateless) → Spot, multi-instance-type, multi-AZ
Batch jobs          → Spot, scale-to-zero capable
Databases           → On-demand, local NVMe (i3/i4)
ML training         → Spot GPU (p3/g5), checkpointing

Cost Quick Reference

Strategy Savings Risk
Spot instances 60-90% Interruption (2 min warning)
Savings Plans (1yr) 30-40% Commitment
Savings Plans (3yr) 50-60% Longer commitment
Right-sizing 20-50% None
Scheduled shutdown (dev) 65% Availability