AWS IAM: The Permissions Puzzle

lesson
iam-users/roles/policies
policy-evaluation-logic
assumerole
cross-account-access
permission-boundaries
scps
access-denied-debugging
linux-permissions-parallels
k8s-rbac-parallels ---# AWS IAM — The Permissions Puzzle

Topics: IAM users/roles/policies, policy evaluation logic, AssumeRole, cross-account access, permission boundaries, SCPs, Access Denied debugging, Linux permissions parallels, K8s RBAC parallels Level: L1–L2 (Foundations to Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)

The Mission¶

It's Thursday afternoon. Your team's deployment pipeline just started failing:

An error occurred (AccessDenied) when calling the PutObject operation:
Access Denied

The pipeline worked yesterday. Nobody changed the IAM policies — or so they claim. The deploy is blocked, staging is stale, and the product manager is pinging Slack every four minutes.

You need to find out which of AWS's overlapping permission layers said no, why it said no, and how to fix it without giving the pipeline more access than it needs. This lesson teaches you to think through IAM the way you'd think through any multi-layer system: by understanding each layer, then systematically eliminating candidates.

Step Zero: Who Am I?¶

Before you debug permissions, debug identity. In 40% or more of "Access Denied" investigations, the caller isn't who you think it is.

aws sts get-caller-identity

{
    "UserId": "AROA3XFRBF23XDCLKWQ6P:deploy-session",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/deploy-role/deploy-session"
}

Read this output carefully:

Field	What it tells you
`Account`	Which AWS account you're operating in
`Arn`	The identity making API calls — user, role, or assumed-role session
`UserId`	Starts with `AROA` for assumed roles, `AIDA` for IAM users

Gotcha: On an EC2 instance, the AWS CLI picks up credentials from the instance metadata service (the instance profile role). Inside an EKS pod, IRSA injects a web identity token. If your ~/.aws/credentials file has stale keys, the CLI might use those instead of the role you expect. Environment variables (AWS_ACCESS_KEY_ID) override instance metadata, and instance metadata overrides config files. The credential resolution order is: env vars > config files > instance metadata. Getting this wrong is the single most common IAM debugging dead end.

This is your first move in every IAM investigation. Tattoo it on your forearm if needed.

The Cast of Characters: Users, Roles, and Groups¶

IAM has three principal types. They exist for different reasons, and choosing the wrong one creates problems that compound over months.

IAM Users — The Permanent Residents¶

An IAM user is a long-lived identity with persistent credentials: a password for console access and optionally access keys for API calls. Every user gets an ARN:

arn:aws:iam::123456789012:user/alice

Users are appropriate for humans who need console access (with MFA) and for legacy integrations that cannot use temporary credentials. For almost everything else, they are the wrong choice.

IAM Roles — The Costume Changes¶

A role is an identity you assume. It has no permanent credentials. When you assume a role, AWS STS gives you temporary credentials that expire (default: 1 hour, max: 12 hours).

# Assume a role and get temporary credentials
aws sts assume-role \
  --role-arn arn:aws:iam::123456789012:role/deploy-role \
  --role-session-name debug-session

Roles have two policy surfaces: a trust policy (who can assume this role) and permission policies (what the role can do once assumed).

Name Origin: The term "assume role" comes from theater — an actor assumes a role by putting on a costume and playing a character. In AWS, your original identity temporarily takes on a different set of permissions, like an actor stepping into a part. When the session expires, you're back to being yourself.

IAM Groups — The Organizers¶

Groups are collections of users. They exist solely to attach policies to multiple users at once. Groups cannot be used as principals in resource policies — you can't write a bucket policy that allows a group. They're an organizational convenience, nothing more.

When to Use What¶

Scenario	Use	Why
Human console access	IAM user + MFA	Needs persistent password
EC2 instance needing AWS access	IAM role via instance profile	Temporary creds, auto-rotated
Lambda function	IAM role (execution role)	Lambda assumes it automatically
EKS pod	IAM role via IRSA	Pod-level isolation, not node-level
CI/CD pipeline	IAM role via OIDC federation	No long-lived keys to leak
Cross-account access	IAM role with trust policy	Both sides must agree

Mental Model: Think of users as people with house keys and roles as security badges that expire. If someone steals a house key, they have access forever (until you change the lock). If someone steals a badge, it stops working at 5pm.

Flashcard Check #1¶

Cover the answers, test yourself:

Q: What is the first command you should run when debugging "Access Denied" in AWS?

aws sts get-caller-identity — to verify you're authenticated as the principal you expect.

Q: What's the key difference between an IAM user and an IAM role?

Users have permanent credentials; roles provide temporary credentials via STS. Roles are assumed, users are authenticated.

Q: Can an IAM group be used as a principal in an S3 bucket policy?

No. Groups can only be used to attach policies to users. They cannot appear as principals in resource-based policies.

The Policy Language: Reading JSON That Controls Your Career¶

IAM policies are JSON documents. Every permission in AWS is defined in this format. Learn to read it fluently — you'll see thousands of these.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowDeployToUploadArtifacts",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::deploy-artifacts-prod/*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "us-east-1"
        }
      }
    }
  ]
}

Breaking it down:

Field	Meaning
`Version`	Always `"2012-10-17"`. The other version (`"2008-10-17"`) is ancient and breaks variable substitution.
`Sid`	Statement ID — optional, human-readable label
`Effect`	`Allow` or `Deny`. That's it. Two options.
`Action`	What API calls this covers. `s3:PutObject`, `ec2:RunInstances`, etc.
`Resource`	Which specific resources — as ARNs. Scope this as tightly as possible.
`Condition`	Optional restrictions: IP ranges, tags, MFA status, time of day

There are three places policies can live:

Type	Where	Use case
Managed policies	Standalone, attached to users/roles/groups	Reusable, versioned, auditable
Inline policies	Embedded in a single user/role/group	Tightly coupled — harder to find
Resource policies	Attached to the resource (S3 bucket, KMS key)	Controls who can access this resource

Gotcha: Inline policies do not appear in the IAM policy list. If you have 200 roles with inline policies, auditing who can do what requires checking every principal individually. Use managed policies. Always. The only acceptable inline policy is the trust policy on a role (which AWS requires to be inline).

The Evaluation Flowchart: How AWS Actually Decides¶

This is the core mental model. When an API call arrives, AWS evaluates policies in a specific order. Understanding this order is the difference between random debugging and systematic elimination.

API Request arrives
        │
        ▼
┌─────────────────────┐
│ Explicit Deny in     │──── YES ──→ DENIED (game over)
│ ANY policy?          │
└──────────┬──────────┘
           │ NO
           ▼
┌─────────────────────┐
│ SCP allows it?       │──── NO ───→ DENIED (org guardrail)
│ (Organization level) │
└──────────┬──────────┘
           │ YES
           ▼
┌─────────────────────┐
│ Resource-based policy│──── YES, same ──→ ALLOWED
│ allows it?           │     account        (early exit)
└──────────┬──────────┘
           │ NO / cross-account
           ▼
┌─────────────────────┐
│ Permission boundary  │──── NO ───→ DENIED (ceiling hit)
│ allows it?           │
└──────────┬──────────┘
           │ YES
           ▼
┌─────────────────────┐
│ Session policy       │──── NO ───→ DENIED (session filter)
│ allows it?           │
└──────────┬──────────┘
           │ YES
           ▼
┌─────────────────────┐
│ Identity-based policy│──── YES ──→ ALLOWED
│ allows it?           │
└──────────┬──────────┘
           │ NO
           ▼
      DENIED (implicit — nothing allowed it)

The critical rules to memorize:

Explicit Deny always wins. It doesn't matter what else allows the action. An explicit "Effect": "Deny" anywhere in the chain overrides everything.
Implicit deny is the default. If nothing explicitly allows an action, it's denied.
SCPs are guardrails, not grants. They restrict what's possible in an account but never give permissions.
Permission boundaries are ceilings. The effective permissions are the intersection of the identity policy and the boundary — not the union.
Cross-account is stricter. Both the source account's identity policy AND the target account's resource policy must allow the action. One-sided access doesn't work.

Under the Hood: The policy evaluation engine is not a simple if/else chain. AWS uses an internal system called Zelkova, based on satisfiability modulo theories (SMT) — the same mathematical foundations used in formal verification of hardware. IAM Access Analyzer uses Zelkova to prove whether a policy can grant access to external principals, providing mathematically certain answers rather than heuristic guesses. This was introduced in 2019.

The Cross-Domain Bridge: Linux Permissions and K8s RBAC¶

IAM is not the only permission system you'll wrestle with. The mental model transfers directly to Linux file permissions and Kubernetes RBAC. Here's the Rosetta Stone:

Concept	AWS IAM	Linux	K8s RBAC
Identity	IAM user/role	UID/GID	User/ServiceAccount
What they can do	IAM policy	rwx bits + ACLs	Role/ClusterRole
Binding identity to permissions	Attach policy to role	chown + chmod	RoleBinding/ClusterRoleBinding
Scope	Account / resource ARN	File / directory	Namespace / cluster
Deny override	Explicit Deny wins	No read bit = denied	No RBAC rule = denied
Escalation guard	Permission boundary	sudo / capabilities	`escalate` verb restriction
Org-level guardrail	SCP	SELinux / AppArmor	Pod Security Standards
"Who am I?"	`aws sts get-caller-identity`	`id`	`kubectl auth whoami`
"Can I do X?"	`aws iam simulate-principal-policy`	`test -r file`	`kubectl auth can-i`

Name Origin: The word "principal" comes from Latin principalis — "first, chief." In security, a principal is any entity that can be authenticated: a user, a role, a service, a process. You'll see it in IAM (principals in policies), Kerberos (principal names), and TLS (the subject of a certificate). Same word, same concept, across every permission system.

The parallels go deep. In Linux, when you can't read a file:

$ cat /etc/shadow
cat: /etc/shadow: Permission denied

$ ls -la /etc/shadow
-rw-r----- 1 root shadow 1234 Mar 10 08:00 /etc/shadow

$ id
uid=1000(deploy) gid=1000(deploy) groups=1000(deploy),27(sudo)

Your debugging ladder is: check identity (id), check permissions (ls -la), check group membership, check ACLs (getfacl), check if SELinux is blocking (getenforce, audit2why).

In Kubernetes, when a service account can't list pods:

$ kubectl get pods -n production
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:ci:deployer"
cannot list resource "pods" in API group "" in the namespace "production"

# Debug: can this SA do what we need?
$ kubectl auth can-i list pods -n production \
    --as=system:serviceaccount:ci:deployer
no

Same pattern: check identity, check what's granted, find the gap.

Mental Model: Every permission system is a gate with the same three questions: Who are you? (authentication), What do you want to do? (action + resource), and Are you allowed? (authorization). The specifics vary — JSON policies vs rwx bits vs RBAC rules — but the debugging ladder is universal: verify identity first, then check each authorization layer from the outside in.

Flashcard Check #2¶

Q: In AWS policy evaluation, what always wins — an explicit Allow or an explicit Deny?

Explicit Deny. Always. An explicit Deny in any policy (SCP, identity, resource, boundary) overrides all Allows.

Q: What is a permission boundary?

A permission boundary is a managed policy attached to an IAM role or user that sets the maximum permissions that identity can have. The effective permissions are the intersection of the boundary and the identity policy — the boundary is a ceiling, not a floor.

Q: What is the K8s equivalent of aws sts get-caller-identity?

kubectl auth whoami (or kubectl auth can-i --list to see what the current identity can do).

The "Access Denied" Debugging Ladder¶

Back to our failing pipeline. Here's the systematic approach — a checklist that eliminates one layer at a time, starting with the most common causes.

Rung 1: Verify Identity¶

aws sts get-caller-identity

If the ARN is not the role you expect, you have a credential resolution problem, not a permission problem. Check environment variables, AWS profiles, and instance metadata.

Rung 2: Simulate the Action¶

aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/deploy-role \
  --action-names s3:PutObject \
  --resource-arns "arn:aws:s3:::deploy-artifacts-prod/build-42.tar.gz" \
  --query 'EvaluationResults[].{Action:EvalActionName,Decision:EvalDecision}'

[
    {
        "Action": "s3:PutObject",
        "Decision": "implicitDeny"
    }
]

The simulator tests the identity-based policies without making a real API call. If it says implicitDeny, no identity policy grants the action. If it says explicitDeny, something is actively blocking.

Trivia: The policy simulator cannot evaluate resource-based policies, SCPs, or VPC endpoint policies. It only tests identity-based policies and permission boundaries. For a full picture, you need to check each layer manually. This limitation trips up even experienced engineers who treat the simulator result as the final answer.

Rung 3: Decode the Error¶

Some AWS services return encoded authorization messages. Decode them:

aws sts decode-authorization-message \
  --encoded-message "<paste-encoded-message>" | \
  jq '.DecodedMessage | fromjson'

This shows which policy denied, what action, what resource, and which conditions failed. Not all services provide this — S3, notably, does not. But when it's available, it's a goldmine.

Rung 4: Check All Policy Layers¶

Work through each layer from the evaluation flowchart:

# Identity policies (managed + inline)
aws iam list-attached-role-policies --role-name deploy-role
aws iam list-role-policies --role-name deploy-role

# Permission boundary
aws iam get-role --role-name deploy-role \
  --query 'Role.PermissionsBoundary.PermissionsBoundaryArn'

# Resource policy (S3 bucket policy in this case)
aws s3api get-bucket-policy --bucket deploy-artifacts-prod \
  | jq '.Policy | fromjson'

# VPC endpoint policy (if traffic goes through a VPC endpoint)
aws ec2 describe-vpc-endpoints \
  --filters "Name=service-name,Values=com.amazonaws.us-east-1.s3" \
  --query 'VpcEndpoints[].PolicyDocument'

# SCPs (requires Organization management account access)
aws organizations list-policies-for-target \
  --target-id 123456789012 \
  --filter SERVICE_CONTROL_POLICY

Rung 5: Check CloudTrail¶

CloudTrail logs every API call, including denied ones:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=PutObject \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[?contains(CloudTrailEvent, `AccessDenied`)].CloudTrailEvent' \
  --output text | jq 'fromjson | {errorCode, errorMessage, requestParameters}'

CloudTrail is your audit trail. It tells you exactly when the denial happened, from which IP, with which credentials. If the pipeline was working yesterday and isn't today, CloudTrail shows you what changed.

War Story: The Wildcard That Cost $50,000¶

War Story: A startup's engineer needed to give a deployment role access to upload artifacts to S3. Under deadline pressure, they wrote "Resource": "*" instead of scoping to the deploy bucket. The role also had s3:DeleteObject. Two weeks later, an automated cleanup script — running with the same role — deleted objects from a billing analytics bucket that fed financial reporting. The data loss wasn't discovered until month-end reconciliation. Recovery from S3 versioning took 12 hours of engineering time. The root cause wasn't malice — it was "Resource": "*" under time pressure.

The lesson: "Resource": "*" in an Allow statement is a security finding, full stop. Scope to specific ARNs. Future-you will thank past-you.

Cross-Account Access: The Two-Sided Handshake¶

Cross-account role assumption requires both accounts to agree. Think of it as two locks on one door — both must be unlocked.

Account A (source)                    Account B (target)
┌─────────────────────┐              ┌──────────────────────┐
│ ci-pipeline role     │─── STS ────→│ cross-account-deploy │
│                      │  AssumeRole │         role          │
│ Identity policy must │              │ Trust policy must    │
│ allow sts:AssumeRole │              │ allow Account A's    │
│ on the target role   │              │ principal             │
└─────────────────────┘              └──────────────────────┘

In Account B, the trust policy on the target role says who can assume it:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::111111111111:role/ci-pipeline"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
        "sts:ExternalId": "deploy-2026-prod"
      }
    }
  }]
}

In Account A, the source role's identity policy says where it can assume roles:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": "sts:AssumeRole",
    "Resource": "arn:aws:iam::222222222222:role/cross-account-deploy"
  }]
}

Both must exist. One-sided = Access Denied.

Gotcha: If the trust policy uses "Principal": {"AWS": "arn:aws:iam::111111111111:root"}, it trusts the entire source account — any user or role in that account can assume the target role. Scope to specific role ARNs. The sts:ExternalId condition adds defense against the "confused deputy" problem, where a third-party service might be tricked into assuming roles on behalf of the wrong customer.

Permission Boundaries: The Ceiling That Cannot Be Broken¶

Permission boundaries solve a real organizational problem: how do you let developers create their own IAM roles (for Lambda functions, for example) without those roles becoming backdoors to admin access?

A permission boundary is a managed policy attached to a role that sets the maximum permissions, regardless of what identity policies are attached. The effective permissions are:

Effective = Identity Policy ∩ Permission Boundary

That's an intersection, not a union. If the boundary allows S3 and DynamoDB, but the identity policy grants AdministratorAccess, the role can still only touch S3 and DynamoDB.

# Check whether a role has a boundary
aws iam get-role --role-name deploy-role \
  --query 'Role.PermissionsBoundary'

If this returns a boundary ARN, that boundary caps everything the role can do. The most common debugging miss: the identity policy allows the action, the resource policy allows it, but a permission boundary silently blocks it.

War Story: A platform team gave developers permission to create IAM roles for their Lambda functions but forgot to require a permission boundary. A developer — with no malicious intent — created a role with AdministratorAccess attached. That Lambda function could now read secrets from every other team's S3 buckets, modify DynamoDB tables it had no business touching, and create new IAM users. The fix was adding a condition to the developers' CreateRole permissions: "iam:PermissionsBoundary": "arn:aws:iam::123456789012:policy/dev-boundary". Without that condition, CreateRole permission is effectively a privilege escalation path.

Service Control Policies: The Organizational Guardrails¶

SCPs are the outermost fence. They're managed at the AWS Organizations level and restrict what member accounts can do — even if the account's own IAM policies allow it.

SCPs do not grant permissions. They only restrict. Think of them as a whitelist of what's possible in the account. A common SCP pattern:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "DenyRegionsOutsideUS",
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {
      "StringNotEquals": {
        "aws:RequestedRegion": ["us-east-1", "us-west-2"]
      },
      "ArnNotLike": {
        "aws:PrincipalARN": "arn:aws:iam::*:role/OrganizationAdmin"
      }
    }
  }]
}

This denies all API calls outside us-east-1 and us-west-2, except for a break-glass admin role. No identity policy can override this — SCPs are evaluated before identity policies in the chain.

Trivia: SCPs do not affect the management (root) account of the organization. If you're debugging "Access Denied" in a member account and the action works fine from the management account, an SCP is your prime suspect. Use aws organizations list-policies-for-target to see which SCPs apply to the account.

Flashcard Check #3¶

Q: For cross-account access, what two things must both be true?

The source account's identity policy must allow sts:AssumeRole on the target role, AND the target role's trust policy must allow the source account's principal. Both sides must agree.

Q: What is the relationship between a permission boundary and an identity policy?

Effective permissions are the intersection of the two. The boundary is a ceiling — identity policies cannot exceed it.

Q: Do SCPs grant permissions?

No. SCPs only restrict. They define the maximum possible permissions for an account. Actual permissions require identity policies (and optionally resource policies).

Solving the Mystery: Our Pipeline Case¶

Let's return to our failing deploy. Walking the ladder:

Rung 1 — get-caller-identity shows the correct role. Identity is fine.

Rung 2 — simulate-principal-policy returns implicitDeny for s3:PutObject. The identity policy doesn't grant this action.

Rung 3 — No encoded message available (S3 doesn't provide them).

Rung 4 — We check the attached policies:

aws iam list-attached-role-policies --role-name deploy-role

{
    "AttachedPolicies": [
        {
            "PolicyName": "deploy-artifacts-v2",
            "PolicyArn": "arn:aws:iam::123456789012:policy/deploy-artifacts-v2"
        }
    ]
}

We pull the policy:

aws iam get-policy-version \
  --policy-arn arn:aws:iam::123456789012:policy/deploy-artifacts-v2 \
  --version-id v2 \
  --query 'PolicyVersion.Document'

The policy allows s3:PutObject on arn:aws:s3:::deploy-artifacts-staging/* — staging, not prod. Someone updated the policy to version 2 yesterday and changed the bucket name. The version they tested was for staging. The production bucket ARN was lost in the edit.

Root cause: A policy version change that swapped the Resource ARN. CloudTrail confirms the CreatePolicyVersion event at 16:42 yesterday, by alice.

Fix: Create a new policy version (v3) with the correct production bucket ARN, and set it as the default version:

aws iam create-policy-version \
  --policy-arn arn:aws:iam::123456789012:policy/deploy-artifacts-v2 \
  --policy-document file://corrected-policy.json \
  --set-as-default

Under the Hood: IAM policies support up to 5 versions. When you create a 6th, you must delete an old version first. The default version is the one that's active. Non-default versions are saved drafts — useful for rollback. Many teams don't know this versioning exists, which means they can't answer "what did this policy look like yesterday?"

Common Misconfigurations: The Greatest Hits¶

These are the patterns that show up in security audits over and over:

1. Wildcard Resource (`"Resource": "*"`)¶

Grants the action on every resource in the account. An Allow for s3:DeleteObject with "Resource": "*" means every bucket — including backups, audit logs, and billing data.

2. Overly Broad Trust Policy¶

{"Principal": {"AWS": "*"}}

This allows any AWS principal in any account to assume the role. Functionally open to the entire internet. Use IAM Access Analyzer to find these.

3. Forgotten Policy Attachment¶

You create a perfect least-privilege policy. You create the role. You forget aws iam attach-role-policy. The role has zero permissions. You spend 30 minutes debugging the policy JSON when no policy was attached at all.

# Quick check: is anything attached?
aws iam list-attached-role-policies --role-name app-role
aws iam list-role-policies --role-name app-role
# Both empty? That's your problem.

4. S3 Bucket vs Object ARN Confusion¶

ListBucket operates on the bucket (arn:aws:s3:::my-bucket). GetObject operates on objects (arn:aws:s3:::my-bucket/*). You need both ARN forms in your policy. Missing one gives partial access that confuses everyone.

5. Eventual Consistency Surprises¶

IAM is eventually consistent. Policy changes can take up to 60 seconds to propagate globally. You attach a policy, immediately test, it fails, you think the policy is wrong. Wait 30 seconds and try again before debugging.

Under the Hood: IAM is a global service replicated to every AWS region. When you create a policy, it's written to the primary store and asynchronously replicated. STS tokens issued before a policy change remain valid until they expire — up to 12 hours for role sessions. This means revoking a role's permissions does not immediately revoke active sessions. To force-terminate sessions, use the IAM console's "Revoke active sessions" feature, which inserts an inline deny policy with a timestamp condition.

Exercises¶

Exercise 1: Identity Verification (2 minutes)¶

If you have AWS CLI configured, run:

aws sts get-caller-identity

What to look for

Note the `Arn` field. Is it a user (`arn:aws:iam::...:user/...`), an assumed role (`arn:aws:sts::...:assumed-role/...`), or a root account? If you're on an EC2 instance or in an EKS pod, verify you're using the expected role, not the node instance role.

Exercise 2: Policy Simulation (5 minutes)¶

Pick a role in your account and simulate an action:

# Replace with a real role ARN and action
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::YOUR_ACCOUNT:role/YOUR_ROLE \
  --action-names s3:GetObject \
  --resource-arns "arn:aws:s3:::some-bucket/some-key" \
  --query 'EvaluationResults[].{Action:EvalActionName,Decision:EvalDecision}'

Interpreting results

- `allowed` — identity policy explicitly allows - `implicitDeny` — nothing in the identity policy grants this action - `explicitDeny` — a Deny statement actively blocks this action Remember: the simulator does NOT evaluate resource-based policies, SCPs, or VPC endpoint policies. A result of `allowed` here does not guarantee the real API call will succeed.

Exercise 3: Find the Overly Permissive Roles (10 minutes)¶

List all roles in your account with AdministratorAccess attached:

aws iam list-entities-for-policy \
  --policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
  --query '{Users:PolicyUsers[].UserName,Roles:PolicyRoles[].RoleName,Groups:PolicyGroups[].GroupName}'

What to do with the results

Every entry in this list is a potential full-account compromise if the principal is breached. For each role: does it genuinely need admin? Can it be scoped to specific services? For each user: is MFA enabled? For each group: who's in it?

Exercise 4: Cross-Domain Comparison (15 minutes)¶

Pick a K8s service account in your cluster and compare its debugging flow to IAM:

# K8s: who am I?
kubectl auth whoami

# K8s: can this SA do what I think?
kubectl auth can-i list pods -n production \
    --as=system:serviceaccount:ci:deployer

# K8s: what can this SA do?
kubectl auth can-i --list -n production \
    --as=system:serviceaccount:ci:deployer

What's similar, what's different

**Similar:** Both use the pattern of checking identity first, then checking what permissions exist. Both default-deny — if no rule explicitly allows, access is blocked. **Different:** K8s RBAC has no "explicit deny" — it's purely additive. If a RoleBinding grants access, you can't block it with another role. AWS IAM has explicit deny that overrides all allows. K8s RBAC scope is determined by the *binding* type (RoleBinding vs ClusterRoleBinding), while AWS IAM scope is determined by the *resource ARN* in the policy.

Cheat Sheet¶

Task	Command
Who am I?	`aws sts get-caller-identity`
Simulate permissions	`aws iam simulate-principal-policy --policy-source-arn ARN --action-names ACTION --resource-arns RESOURCE`
Decode auth error	`aws sts decode-authorization-message --encoded-message MSG \\| jq '.DecodedMessage \\| fromjson'`
List role's managed policies	`aws iam list-attached-role-policies --role-name ROLE`
List role's inline policies	`aws iam list-role-policies --role-name ROLE`
Get inline policy document	`aws iam get-role-policy --role-name ROLE --policy-name POLICY`
Check permission boundary	`aws iam get-role --role-name ROLE --query 'Role.PermissionsBoundary'`
Get S3 bucket policy	`aws s3api get-bucket-policy --bucket BUCKET \\| jq '.Policy \\| fromjson'`
List SCPs on account	`aws organizations list-policies-for-target --target-id ACCT_ID --filter SERVICE_CONTROL_POLICY`
Find admin roles	`aws iam list-entities-for-policy --policy-arn arn:aws:iam::aws:policy/AdministratorAccess`
Assume cross-account role	`aws sts assume-role --role-arn ARN --role-session-name NAME --external-id ID`
Check CloudTrail for denials	`aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=ACTION`

The Access Denied Ladder (memorize this order):

get-caller-identity — am I who I think I am?
simulate-principal-policy — does my identity policy allow this?
decode-authorization-message — what specifically was denied?
Check all layers: identity policy, permission boundary, resource policy, VPC endpoint policy, SCP
cloudtrail lookup-events — when did this start failing and what changed?

Policy Evaluation Order (memorize this):

Explicit Deny > SCP > Resource Policy (same-account shortcut) > Permission Boundary > Session Policy > Identity Policy > Implicit Deny

Takeaways¶

Start every IAM investigation with aws sts get-caller-identity. Wrong identity is the #1 cause of "Access Denied" — before wrong policy.
Explicit Deny always wins. An explicit Deny in any layer — SCP, identity policy, resource policy, boundary — cannot be overridden. This is the foundational rule of IAM policy evaluation.
Permission systems share a universal pattern. AWS IAM, Linux file permissions, and K8s RBAC all answer the same three questions: who are you, what do you want, are you allowed? The debugging approach transfers across all of them.
Roles with temporary credentials are the default. IAM users with long-lived access keys are the exception, not the norm. If you're creating access keys, ask yourself whether a role would work instead.
"Resource": "*" is a security finding. Every wildcard resource in an Allow statement is a blast radius waiting to expand. Scope to specific ARNs.
Cross-account access requires both sides to agree. The source account's identity policy AND the target account's trust policy must both allow the action. One-sided is denied.

Permission Denied — The Linux-side version of this lesson: file permissions, sudo, capabilities, SELinux, container users, K8s RBAC. Same debugging ladder, different layers.
Secrets Management Without Tears — What happens after you get permissions right: managing the secrets those permissions protect. Covers env vars, Vault, K8s secrets, and rotation.
The Container Escape — When container isolation fails, IAM becomes the last line of defense. Covers namespaces, capabilities, seccomp, and why running as root inside a container is worse than you think.