Cloud Provider Deep-Dive Cheat Sheet¶
Under the hood: IRSA (AWS) and Workload Identity (GCP) both work by projecting a signed service account token into the pod. The cloud IAM service trusts the cluster's OIDC issuer, verifies the token's audience and subject claims, and issues short-lived cloud credentials. No long-lived keys are stored anywhere — credentials rotate automatically every ~12 hours.
IAM / Identity Federation¶
AWS EKS: IRSA (IAM Roles for Service Accounts)¶
# 1. Annotate ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: s3-reader
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123:role/s3-reader
# 2. Create IAM role with OIDC trust
aws iam create-role --role-name s3-reader \
--assume-role-policy-document file://trust-policy.json
# 3. Attach permissions
aws iam attach-role-policy --role-name s3-reader \
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
GCP GKE: Workload Identity¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: gcs-reader
annotations:
iam.gke.io/gcp-service-account: gcs-reader@project.iam.gserviceaccount.com
gcloud iam service-accounts add-iam-policy-binding \
gcs-reader@project.iam.gserviceaccount.com \
--member="serviceAccount:project.svc.id.goog[namespace/gcs-reader]" \
--role="roles/iam.workloadIdentityUser"
Load Balancers¶
| Type | Layer | Use Case | Static IP | Cost |
|---|---|---|---|---|
| ALB (AWS) | 7 | HTTP/HTTPS, path routing | No | $$ |
| NLB (AWS) | 4 | TCP/UDP, gRPC, low latency | Yes | $ |
| CLB (AWS) | 4/7 | Legacy, avoid for new | No | $ |
| Google LB | 7 | HTTP(S), global | Yes (anycast) | $$ |
# AWS NLB via Service
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
type: LoadBalancer
# AWS ALB via Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
ingressClassName: alb # preferred over kubernetes.io/ingress.class annotation
VPC / Networking¶
Gotcha: AWS ALB does not support static IP addresses — use NLB if you need a fixed IP (e.g., for DNS A records or firewall whitelisting). ALB IPs change as AWS scales the load balancer. The workaround is to put a Global Accelerator in front of the ALB.
Subnet IP Exhaustion (EKS)¶
# Check available IPs
aws ec2 describe-subnets --subnet-ids subnet-xxx \
--query 'Subnets[].AvailableIpAddressCount'
# EKS VPC CNI: IPs per node = (max ENIs) × (IPs per ENI) - 1
# m5.large: 3 ENIs × 10 IPs = 29 pod IPs
# Fix: Enable prefix delegation (16 IPs per slot instead of 1)
kubectl set env daemonset aws-node -n kube-system \
ENABLE_PREFIX_DELEGATION=true
# Or: Add secondary CIDR
aws ec2 associate-vpc-cidr-block --vpc-id vpc-xxx \
--cidr-block 100.64.0.0/16
Security Groups for Pods (EKS)¶
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
name: db-access
spec:
podSelector:
matchLabels:
role: backend
securityGroups:
groupIds:
- sg-xxx # Allow DB access
Storage¶
| AWS | GCP | Use Case | IOPS |
|---|---|---|---|
| gp3 | pd-balanced | General purpose | 3,000-16,000 |
| io2 | pd-ssd | Databases | Up to 256,000 |
| st1 | pd-standard | Throughput | Low |
| Instance store | Local SSD | Temp/cache | Very high |
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "5000"
throughput: "250"
allowVolumeExpansion: true
reclaimPolicy: Retain
Default trap: EKS VPC CNI assigns one IP per pod from the node's subnet. An
m5.largesupports only 29 pod IPs — you can exhaust subnet IPs quickly with many small pods. Enable prefix delegation (ENABLE_PREFIX_DELEGATION=true) to get 16 IPs per ENI slot instead of 1, dramatically increasing pod density.
Node Groups / Instance Strategy¶
System workloads → On-demand, m5.large, tainted
Web/API (stateless) → Spot, multi-instance-type, multi-AZ
Batch jobs → Spot, scale-to-zero capable
Databases → On-demand, local NVMe (i3/i4)
ML training → Spot GPU (p3/g5), checkpointing
Cost Quick Reference¶
| Strategy | Savings | Risk |
|---|---|---|
| Spot instances | 60-90% | Interruption (2 min warning) |
| Savings Plans (1yr) | 30-40% | Commitment |
| Savings Plans (3yr) | 50-60% | Longer commitment |
| Right-sizing | 20-50% | None |
| Scheduled shutdown (dev) | 65% | Availability |