Skip to content

Portal | Level: L2: Operations | Topics: Cloud Deep Dive | Domain: Cloud

Cloud Provider Deep-Dive Drills

Remember: Cloud workload identity follows the same pattern across providers: a Kubernetes ServiceAccount is mapped to a cloud IAM identity via OIDC federation. AWS calls it IRSA, GCP calls it Workload Identity, Azure calls it Azure AD Workload Identity. The mechanism is the same: pod gets a JWT -> presents it to the cloud's STS -> gets temporary cloud credentials scoped to the mapped role.

Gotcha: IRSA and Workload Identity require both sides to grant access: the Kubernetes ServiceAccount must be annotated with the cloud role, AND the cloud IAM trust policy must allow the specific ServiceAccount. Forgetting either side results in AccessDenied with no obvious error about which side is wrong. Check both.

Drill 1: IAM Roles for Service Accounts (IRSA)

Difficulty: Medium

Q: How does IRSA (IAM Roles for Service Accounts) work on EKS? Why is it better than using node instance roles?

Answer How IRSA works:
1. Pod uses K8s ServiceAccount with annotation
2. Mutating webhook injects AWS_ROLE_ARN + token volume
3. Pod calls STS AssumeRoleWithWebIdentity
4. STS validates token via OIDC provider
5. Pod gets temporary credentials scoped to the IAM role
# ServiceAccount with IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-reader
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/s3-reader-role
// IAM trust policy
{
  "Effect": "Allow",
  "Principal": {
    "Federated": "arn:aws:iam::123456789:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/ABCDEF"
  },
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringEquals": {
      "oidc.eks.us-east-1.amazonaws.com/id/ABCDEF:sub": "system:serviceaccount:production:s3-reader"
    }
  }
}
Why better than node roles: - **Least privilege**: Each pod gets only the permissions it needs - **No credential sharing**: Pods on the same node can have different roles - **No IMDS attacks**: Doesn't rely on instance metadata service

Drill 2: GCP Workload Identity

Difficulty: Medium

Q: What is the GCP equivalent of IRSA, and how do you set it up?

Answer GCP **Workload Identity** maps a K8s ServiceAccount to a GCP Service Account.
# 1. Create GCP Service Account
gcloud iam service-accounts create gcs-reader

# 2. Grant GCP permissions
gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:gcs-reader@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

# 3. Allow K8s SA to impersonate GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  gcs-reader@my-project.iam.gserviceaccount.com \
  --member="serviceAccount:my-project.svc.id.goog[production/gcs-reader]" \
  --role="roles/iam.workloadIdentityUser"
# 4. Annotate K8s ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gcs-reader
  namespace: production
  annotations:
    iam.gke.io/gcp-service-account: gcs-reader@my-project.iam.gserviceaccount.com

Drill 3: VPC Subnet IP Exhaustion

Difficulty: Hard

Q: Pods are stuck in Pending with "failed to allocate for range: no available addresses." What happened and how do you fix it?

Answer
# 1. Check node CIDR allocations
kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, podCIDR: .spec.podCIDR}'

# 2. Check how many IPs are used per node (EKS with VPC CNI)
kubectl get nodes -o json | jq '.items[] | {
  name: .metadata.name,
  allocatable_pods: .status.allocatable.pods
}'

# 3. For EKS: check ENI allocation
aws ec2 describe-network-interfaces \
  --filters Name=tag:node.k8s.amazonaws.com/instance_id,Values=i-xxx \
  | jq '.NetworkInterfaces | length'
Fixes: 1. **Immediate**: Enable prefix delegation (EKS VPC CNI)
kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true
2. **Medium-term**: Add secondary CIDR to VPC
aws ec2 associate-vpc-cidr-block --vpc-id vpc-xxx --cidr-block 100.64.0.0/16
3. **Long-term**: Use larger subnets or overlay networking EKS VPC CNI allocates real VPC IPs. Each instance type has a max ENI × IPs-per-ENI limit.

Drill 4: ALB vs NLB

Difficulty: Easy

Q: When would you use an ALB (Application Load Balancer) vs NLB (Network Load Balancer) for Kubernetes?

Answer | Feature | ALB (Layer 7) | NLB (Layer 4) | |---------|---------------|----------------| | Protocol | HTTP/HTTPS/gRPC | TCP/UDP/TLS | | Routing | Path, host, header-based | Port-based only | | TLS termination | Yes (with ACM certs) | Yes (TLS passthrough or termination) | | WebSockets | Yes | Yes | | Static IP | No (use Global Accelerator) | Yes | | Latency | Higher (~ms for L7 processing) | Lower (packet-level forwarding) | | Health checks | HTTP health checks | TCP/HTTP health checks | | Use case | Web apps, APIs, microservices | gRPC, TCP services, low latency |
# ALB (via AWS Load Balancer Controller)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb  # preferred over kubernetes.io/ingress.class annotation
  rules: [...]

# NLB
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
spec:
  type: LoadBalancer

Drill 5: Cross-Account Access

Difficulty: Hard

Q: Your EKS pods need to access an S3 bucket in a different AWS account. How do you set this up?

Answer
// 1. In Account B (bucket owner): bucket policy
{
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::ACCOUNT_A:role/s3-reader-role"
    },
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::cross-account-bucket",
      "arn:aws:s3:::cross-account-bucket/*"
    ]
  }]
}
// 2. In Account A (EKS account): IAM policy on the IRSA role
{
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::cross-account-bucket",
      "arn:aws:s3:::cross-account-bucket/*"
    ]
  }]
}
Both sides must grant access. The IRSA role in Account A needs the IAM policy, and the bucket in Account B needs a bucket policy allowing the role.

Drill 6: Node Group Strategy

Difficulty: Medium

Q: Design a node group strategy for a production EKS cluster running mixed workloads (web apps, batch jobs, databases).

Answer
# 1. System node group (always-on, on-demand)
- name: system
  instanceTypes: [m5.large]
  capacityType: ON_DEMAND
  minSize: 2
  maxSize: 4
  taints: [{key: CriticalAddonsOnly, effect: NoSchedule}]
  labels: {workload-type: system}

# 2. Web apps (mixed on-demand + spot)
- name: web
  instanceTypes: [m5.xlarge, m5a.xlarge, m6i.xlarge]
  capacityType: SPOT
  minSize: 2
  maxSize: 20
  labels: {workload-type: web}

# 3. Batch/Jobs (spot, cost-optimized)
- name: batch
  instanceTypes: [c5.2xlarge, c5a.2xlarge, c6i.2xlarge]
  capacityType: SPOT
  minSize: 0
  maxSize: 50
  labels: {workload-type: batch}
  taints: [{key: workload, value: batch, effect: NoSchedule}]

# 4. Databases (on-demand, local SSD)
- name: database
  instanceTypes: [i3.xlarge]
  capacityType: ON_DEMAND
  minSize: 3
  maxSize: 6
  labels: {workload-type: database}
  taints: [{key: workload, value: database, effect: NoSchedule}]
Key principles: - System workloads on dedicated on-demand nodes - Stateless web apps on spot with multi-AZ spread - Batch jobs on spot with scale-to-zero - Databases on on-demand with local NVMe storage - Use taints to isolate workload types

Wiki Navigation

Prerequisites

  • AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
  • AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS EC2 (Topic Pack, L1) — Cloud Deep Dive
  • AWS ECS (Topic Pack, L2) — Cloud Deep Dive
  • AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS IAM (Topic Pack, L1) — Cloud Deep Dive
  • AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
  • AWS Networking (Topic Pack, L1) — Cloud Deep Dive
  • AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
  • AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive