AWS IAM - Street-Level Ops¶

Real-world IAM workflows for production environments. These are the procedures you reach for during incidents, audits, and day-to-day operations.

Auditing Who Has Admin Access¶

This is the first thing security will ask during an audit. You need to find every principal with admin-equivalent permissions.

# Find all users with AdministratorAccess managed policy
aws iam list-entities-for-policy \
  --policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
  --query '{Users:PolicyUsers[].UserName,Roles:PolicyRoles[].RoleName,Groups:PolicyGroups[].GroupName}'

# Find all users with inline policies containing "*" actions
for user in $(aws iam list-users --query 'Users[].UserName' --output text); do
  for policy in $(aws iam list-user-policies --user-name "$user" --query 'PolicyNames[]' --output text); do
    doc=$(aws iam get-user-policy --user-name "$user" --policy-name "$policy" \
      --query 'PolicyDocument' --output json)
    if echo "$doc" | grep -q '"Action": "\*"'; then
      echo "ADMIN: user=$user policy=$policy"
    fi
  done
done

# Find roles with broad assume-role trust (especially those trusting *)
for role in $(aws iam list-roles --query 'Roles[].RoleName' --output text); do
  trust=$(aws iam get-role --role-name "$role" \
    --query 'Role.AssumeRolePolicyDocument' --output json 2>/dev/null)
  if echo "$trust" | grep -q '"AWS": "\*"'; then
    echo "WIDE TRUST: role=$role"
  fi
done

For a comprehensive audit, use IAM Access Analyzer:

# Generate a findings report
aws accessanalyzer list-findings \
  --analyzer-arn arn:aws:access-analyzer:us-east-1:123456789012:analyzer/account-analyzer \
  --query 'findings[?status==`ACTIVE`].{Resource:resource,Type:resourceType,External:principal}' \
  --output table

Finding Unused IAM Users and Roles¶

Unused principals are attack surface. Find and remove them.

# Users who have never logged in or used access keys
aws iam generate-credential-report
sleep 5
aws iam get-credential-report --query 'Content' --output text | \
  base64 -d | \
  awk -F, 'NR>1 {
    user=$1; pass_last=$5; key1_last=$11; key2_last=$16;
    if (pass_last == "no_information" && key1_last == "N/A" && key2_last == "N/A")
      print "NEVER USED: " user
    else if (pass_last != "no_information" && pass_last != "not_supported")
      print user ": last password use=" pass_last
  }'

# Roles not used in 90+ days (requires Access Advisor)
for role in $(aws iam list-roles --query 'Roles[?starts_with(RoleName, `app-`)].[RoleName]' --output text); do
  last_used=$(aws iam get-role --role-name "$role" \
    --query 'Role.RoleLastUsed.LastUsedDate' --output text)
  if [ "$last_used" = "None" ]; then
    echo "NEVER USED: $role"
  else
    echo "$role last used: $last_used"
  fi
done

# Use Access Analyzer to find unused permissions on a role
aws accessanalyzer list-findings \
  --analyzer-arn arn:aws:access-analyzer:us-east-1:123456789012:analyzer/unused-access \
  --filter '{"resourceType":{"eq":["AWS::IAM::Role"]}}'

Debugging "Access Denied"¶

One-liner: When you get Access Denied, the first command is always aws sts get-caller-identity. In 40%+ of cases, you are authenticated as the wrong principal -- a node role instead of a pod role, a default profile instead of the assumed role, or a stale session token.

The systematic approach to "I get Access Denied and I don't know why."

# Step 1: Who am I? (wrong identity is the #1 cause)
aws sts get-caller-identity
# Verify: account, role/user, and session name

# Step 2: What am I trying to do? Simulate it.
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/app-role \
  --action-names s3:PutObject \
  --resource-arns "arn:aws:s3:::prod-data/uploads/*" \
  --query 'EvaluationResults[].{Action:EvalActionName,Decision:EvalDecision,Matched:MatchedStatements}'

# Step 3: Decode the authorization failure (if you got one)
aws sts decode-authorization-message \
  --encoded-message "<paste-encoded-message>" | \
  jq '.DecodedMessage | fromjson'
# This shows: which policy denied, what action, what resource, what conditions failed

# Step 4: Check CloudTrail for the denial event
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=PutObject \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[?contains(CloudTrailEvent, `AccessDenied`)].{Time:EventTime,Event:CloudTrailEvent}' \
  --output json | jq '.[].Event | fromjson | {errorCode, errorMessage, requestParameters}'

# Step 5: Check ALL policy layers
# Identity policies
aws iam list-attached-role-policies --role-name app-role
aws iam list-role-policies --role-name app-role

# Permission boundary
aws iam get-role --role-name app-role \
  --query 'Role.PermissionsBoundary'

# Resource policy (example: S3 bucket policy)
aws s3api get-bucket-policy --bucket prod-data | jq '.Policy | fromjson'

# VPC endpoint policy (if using VPC endpoints)
aws ec2 describe-vpc-endpoints \
  --filters "Name=service-name,Values=com.amazonaws.us-east-1.s3" \
  --query 'VpcEndpoints[].PolicyDocument'

# SCPs (requires org management account access)
aws organizations list-policies-for-target \
  --target-id 123456789012 \
  --filter SERVICE_CONTROL_POLICY

Cross-Account Role Setup¶

Setting up a role in Account B that Account A can assume.

# In Account B (target): create the role
cat > trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::111111111111:role/ci-pipeline"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
        "sts:ExternalId": "prod-deploy-2024"
      }
    }
  }]
}
EOF

aws iam create-role \
  --role-name cross-account-deploy \
  --assume-role-policy-document file://trust-policy.json

aws iam attach-role-policy \
  --role-name cross-account-deploy \
  --policy-arn arn:aws:iam::aws:policy/AmazonECS_FullAccess

# In Account A (source): grant the CI role permission to assume
cat > assume-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": "sts:AssumeRole",
    "Resource": "arn:aws:iam::222222222222:role/cross-account-deploy"
  }]
}
EOF

aws iam put-role-policy \
  --role-name ci-pipeline \
  --policy-name assume-target-account \
  --policy-document file://assume-policy.json

# Test the assumption
aws sts assume-role \
  --role-arn arn:aws:iam::222222222222:role/cross-account-deploy \
  --role-session-name test-session \
  --external-id prod-deploy-2024

Emergency Break-Glass IAM Role¶

A pre-configured role for incident response that bypasses normal permission restrictions. Set it up before you need it.

# Create the break-glass role (admin access, heavily logged)
cat > break-glass-trust.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::123456789012:root"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "Bool": {"aws:MultiFactorAuthPresent": "true"},
      "NumericLessThan": {"aws:MultiFactorAuthAge": "3600"}
    }
  }]
}
EOF

aws iam create-role \
  --role-name break-glass-admin \
  --assume-role-policy-document file://break-glass-trust.json \
  --max-session-duration 3600  # 1 hour max

aws iam attach-role-policy \
  --role-name break-glass-admin \
  --policy-arn arn:aws:iam::aws:policy/AdministratorAccess

# Set up CloudWatch alarm for when this role is used
# (use CloudTrail + EventBridge to trigger PagerDuty/Slack alert)

Requirements: - MFA required to assume - 1-hour max session duration - CloudTrail logging (always on) - Alert fires every time the role is assumed - Documented in runbook: when to use, who can authorize, post-incident review process

Rotating Access Keys¶

Access keys should be rotated every 90 days. Automate this.

# List access keys for a user
aws iam list-access-keys --user-name svc-deploy \
  --query 'AccessKeyMetadata[].{KeyId:AccessKeyId,Status:Status,Created:CreateDate}'

# Create new key
NEW_KEY=$(aws iam create-access-key --user-name svc-deploy)
echo "$NEW_KEY" | jq '{AccessKeyId: .AccessKey.AccessKeyId, SecretAccessKey: .AccessKey.SecretAccessKey}'

# Update the application/CI system with the new key
# ... (deploy the new credentials) ...

# Deactivate old key (do NOT delete yet — gives you rollback)
aws iam update-access-key \
  --user-name svc-deploy \
  --access-key-id AKIAOLD123456 \
  --status Inactive

# After confirming everything works (wait 24-48 hours), delete
aws iam delete-access-key \
  --user-name svc-deploy \
  --access-key-id AKIAOLD123456

Conditional MFA Policies¶

Require MFA for sensitive operations while allowing low-risk reads without it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadWithoutMFA",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "ec2:Describe*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowWriteOnlyWithMFA",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:DeleteObject",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
      ],
      "Resource": "*",
      "Condition": {
        "Bool": {"aws:MultiFactorAuthPresent": "true"}
      }
    }
  ]
}

S3 Bucket Policy Debugging¶

S3 access is the intersection of IAM policy AND bucket policy AND ACL (legacy). When access fails, check all three.

# Check bucket policy
aws s3api get-bucket-policy --bucket my-bucket | jq '.Policy | fromjson'

# Check bucket ACL (legacy — should be disabled)
aws s3api get-bucket-acl --bucket my-bucket

# Check if public access block is in place
aws s3api get-public-access-block --bucket my-bucket

# Check if the bucket is using S3 Object Ownership (ACLs disabled)
aws s3api get-bucket-ownership-controls --bucket my-bucket

# Test access with a specific role
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/app-role \
  --action-names s3:GetObject \
  --resource-arns "arn:aws:s3:::my-bucket/data/file.json" \
  --query 'EvaluationResults[].EvalDecision'

# Common S3 policy mistake: bucket-level vs object-level ARNs
# ListBucket needs:  arn:aws:s3:::my-bucket
# GetObject needs:   arn:aws:s3:::my-bucket/*
# PutObject needs:   arn:aws:s3:::my-bucket/*

EKS IRSA Troubleshooting¶

When a pod cannot access AWS resources through IRSA.

# Step 1: Verify the service account annotation
kubectl get sa my-app-sa -n default -o jsonpath='{.metadata.annotations.eks\.amazonaws\.com/role-arn}'
# Should output the role ARN

# Step 2: Check if the pod has the projected token
kubectl exec -it my-pod -n default -- ls -la /var/run/secrets/eks.amazonaws.com/serviceaccount/
# Should show: token

# Step 3: Check the token contents
kubectl exec -it my-pod -n default -- cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token | \
  cut -d. -f2 | base64 -d 2>/dev/null | jq .
# Should show iss (OIDC issuer), sub (service account), aud (sts.amazonaws.com)

# Step 4: Verify OIDC provider in IAM
aws eks describe-cluster --name my-cluster \
  --query 'cluster.identity.oidc.issuer' --output text

aws iam list-open-id-connect-providers | \
  jq '.OpenIDConnectProviderList[].Arn'

# Step 5: Check the role trust policy
aws iam get-role --role-name my-app-role \
  --query 'Role.AssumeRolePolicyDocument' | jq .
# Verify: the OIDC provider ARN, the sub condition matches namespace:sa-name, aud is sts.amazonaws.com

# Step 6: Test from inside the pod
kubectl exec -it my-pod -n default -- aws sts get-caller-identity
# Should show the IRSA role, not the node role

# Common failures:
# - Service account not annotated
# - Pod spec doesn't reference the service account
# - OIDC provider not created in IAM
# - Trust policy sub condition doesn't match namespace:sa-name exactly
# - Role has no permission policies attached
# - AWS SDK in the container is too old to support IRSA token exchange

Permission Boundaries for Developer Sandboxes¶

Let developers create their own IAM roles but limit what those roles can do.

# Create the permission boundary
cat > dev-boundary.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowedServices",
      "Effect": "Allow",
      "Action": [
        "s3:*",
        "dynamodb:*",
        "lambda:*",
        "sqs:*",
        "sns:*",
        "logs:*",
        "cloudwatch:*",
        "xray:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyProductionResources",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/Environment": "production"
        }
      }
    },
    {
      "Sid": "DenyDangerousActions",
      "Effect": "Deny",
      "Action": [
        "iam:CreateUser",
        "iam:DeleteRole",
        "iam:DeletePolicy",
        "organizations:*",
        "account:*"
      ],
      "Resource": "*"
    }
  ]
}
EOF

aws iam create-policy \
  --policy-name dev-sandbox-boundary \
  --policy-document file://dev-boundary.json

# Developer policy: allow creating roles only with the boundary attached
cat > dev-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCreateRoleWithBoundary",
      "Effect": "Allow",
      "Action": [
        "iam:CreateRole",
        "iam:AttachRolePolicy",
        "iam:PutRolePolicy"
      ],
      "Resource": "arn:aws:iam::123456789012:role/dev-*",
      "Condition": {
        "StringEquals": {
          "iam:PermissionsBoundary": "arn:aws:iam::123456789012:policy/dev-sandbox-boundary"
        }
      }
    },
    {
      "Sid": "AllowPassRoleForLambda",
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::123456789012:role/dev-*",
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "lambda.amazonaws.com"
        }
      }
    }
  ]
}
EOF

This ensures developers can create roles for their Lambda functions, but those roles can never exceed the boundary — even if the developer attaches AdministratorAccess to their role, the boundary caps what it can actually do.