Quiz: AWS EC2¶

6 questions

L1 (4 questions)¶

1. Your t3.micro runs at 15% CPU consistently. Initially responsive, after a week it becomes extremely slow. CPU metrics show 100% utilization in spikes. What is happening?

Show answer

t3.micro has a CPU baseline of 10%. At 15% sustained usage, you consume credits faster than you earn them. Initially, the instance has a starting credit balance (launch credits) that masks the deficit. Once credits deplete, the instance is throttled to baseline (10%), causing the slowdown. In 'standard' mode, the instance cannot burst above baseline with zero credits. In 'unlimited' mode, it can burst but you pay overage charges per vCPU-hour. Diagnosis: check the CPUCreditBalance CloudWatch metric — a downward trend toward zero confirms credit exhaustion. Fix:
1. Switch to unlimited mode if bursts are occasional.
2. Right-size to a larger t3 or switch to m-family (no credit system) if sustained load exceeds baseline.

2. You store application state on an i3.large NVMe instance store volume. During an AWS maintenance event, the instance is stopped and started on new hardware. Your data is gone. Was this preventable?

Show answer

Instance store volumes are ephemeral — data is lost on stop, terminate, or host migration. This is by design and documented, but commonly overlooked. Preventable: yes.
1. Use EBS volumes for any data that must survive instance lifecycle events.
2. If you need instance store performance (high IOPS, low latency), use it as a cache or scratch space with data replicated to a durable store (EBS, S3, or another instance).
3. For databases on instance store, use replication (e.g., a replica on another instance) so data survives single-instance loss. The i3 family is designed for high-performance local storage workloads that handle data durability at the application layer, not the infrastructure layer.

3. You need to create a custom AMI from a running instance. What steps should you take before creating the image to ensure it is clean and reusable?

Show answer

1. Remove instance-specific state: delete SSH host keys (/etc/ssh/ssh_host_*), clear /tmp, remove cloud-init state (rm -rf /var/lib/cloud/instances/*).
2. Remove credentials: delete ~/.aws/credentials, ~/.ssh/authorized_keys for non-default users, clear bash history.
3. Generalize the network config: remove /etc/udev/rules.d/70-persistent-net.rules if it exists.
4. Stop unnecessary services and clear logs if size matters.
5. For the AMI creation: you can create from a running instance (AWS takes an EBS snapshot), but stopping the instance first ensures a consistent filesystem state — in-flight writes won't be captured mid-transaction.
6. Tag the AMI with the source commit, build date, and base OS version for traceability.

4. A Spot Instance running your batch job is interrupted with a 2-minute warning. How do you handle this gracefully without losing work?

Show answer

1. Poll the instance metadata endpoint for the interruption notice: GET http://169.254.169.254/latest/meta-data/spot/instance-action — returns the action (terminate/stop/hibernate) and time.
2. Use a shutdown script or signal handler: when the notice appears, checkpoint current work to S3 or EBS, deregister from the load balancer, and flush logs.
3. Design for resumability: store job progress so another instance can resume from the last checkpoint, not restart from scratch.
4. Use Spot Instance interruption notices in EventBridge to trigger an external handler (Lambda) that starts a replacement instance before the current one terminates.
5. For ECS/EKS: use Spot Instance draining to gracefully remove tasks before termination. The 2-minute window is short — your checkpoint logic must be fast.

L2 (2 questions)¶

1. Your application fetches IAM role credentials from the instance metadata service at http://169.254.169.254. A vulnerability in your web app allows SSRF. What is the risk, and how does IMDSv2 mitigate it?

Show answer

IMDSv1 responds to simple HTTP GET requests. An SSRF vulnerability allows an attacker to make the server request http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME and exfiltrate temporary IAM credentials (access key, secret key, session token). These credentials have the full permissions of the instance role. IMDSv2 mitigates this by requiring a session token: first PUT to /latest/api/token with a TTL header (hop limit = 1), then use the token in subsequent GET requests. The PUT request's hop limit of 1 means it cannot traverse a proxy or be forwarded from an SSRF — the attacker's request comes from outside the instance's network stack and the hop count exceeds 1. Enforce IMDSv2 by setting HttpTokens=required on the instance.

2. You have a Reserved Instance for m5.large in us-east-1a. You want to use m5.xlarge in us-east-1b instead. Can you modify the RI, and what are the constraints?

Show answer

You can modify a Standard RI within the same instance family and region. m5.large to m5.xlarge is the same family (m5), so the size change is allowed — but you must account for the normalization factor difference (large = 4 units, xlarge = 8 units, so one large RI only covers half an xlarge). AZ change within the same region is allowed (us-east-1a to us-east-1b). However: you cannot change the instance family (m5 to c5), OS platform (Linux to Windows), or tenancy (shared to dedicated). For those changes, you need to sell the RI on the Reserved Instance Marketplace and purchase a new one, or use Savings Plans instead which offer more flexibility across instance families.