Portal | Level: L1: Foundations | Topics: AWS EC2, Cloud Deep Dive | Domain: Cloud
AWS EC2 - Primer¶
Why This Matters¶
EC2 is the workhorse of AWS. Even in a container-first, serverless-leaning world, EC2 instances underpin EKS worker nodes, RDS databases, ElastiCache clusters, and anything else that needs a virtual machine. Understanding instance types, storage options, networking behavior, and pricing models directly impacts your ability to run reliable, cost-effective infrastructure.
When an instance is unreachable, when CPU credits run out at 2 AM, when you lose data because you did not understand the difference between instance store and EBS — that is when EC2 knowledge pays for itself.
Core Concepts¶
1. Instance Types and Families¶
Remember: Instance family letter mnemonic: Most workloads (general), Compute, RAM (memory), Tiny-burst (burstable), I/O (storage), Parallel (GPU). The letter tells you what the instance is optimized for.
Every instance type follows the naming convention: <family><generation>.<size>
m7g.xlarge
│││ └── Size: xlarge (4 vCPU, 16 GiB)
││└──── Generation: 7th
│└───── Family: m (general purpose)
└────── (optional) Processor: g = Graviton
Instance families:
| Family | Optimized For | Examples |
|---|---|---|
| m (general) | Balanced CPU/memory | Web servers, app servers, dev/test |
| c (compute) | CPU-intensive | Batch processing, ML inference, gaming |
| r (memory) | Memory-intensive | Databases, in-memory caches, analytics |
| i/d (storage) | High I/O, local NVMe | Databases needing IOPS, data warehousing |
| t (burstable) | Variable workloads | Dev/test, small databases, microservices |
| p/g (accelerated) | GPU workloads | ML training, video encoding, HPC |
# List all instance types available in your region
aws ec2 describe-instance-types \
--query 'InstanceTypes[].{Type:InstanceType,vCPU:VCpuInfo.DefaultVCpus,MemGB:MemoryInfo.SizeInMiB}' \
--filters "Name=instance-type,Values=m7*" \
--output table
# Check pricing (use the pricing API or the website)
aws pricing get-products \
--service-code AmazonEC2 \
--filters "Type=TERM_MATCH,Field=instanceType,Value=m7g.xlarge" \
--region us-east-1
Graviton instances (suffix g): ARM-based, 20-40% better price-performance than x86 for most workloads. Use them unless your software requires x86.
2. AMIs and the Boot Process¶
An Amazon Machine Image (AMI) is a snapshot of a root volume plus metadata (kernel, block device mapping, permissions). Every instance launches from an AMI.
# Find the latest Amazon Linux 2023 AMI
aws ec2 describe-images \
--owners amazon \
--filters "Name=name,Values=al2023-ami-2023*-x86_64" \
--query 'sort_by(Images, &CreationDate)[-1].{ID:ImageId,Name:Name,Date:CreationDate}' \
--output table
# Find the latest Ubuntu 22.04 AMI
aws ec2 describe-images \
--owners 099720109477 \
--filters "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
--query 'sort_by(Images, &CreationDate)[-1].{ID:ImageId,Name:Name}'
# Create your own AMI from a running instance
aws ec2 create-image \
--instance-id i-abc123 \
--name "app-server-$(date +%Y%m%d)" \
--description "App server with deps pre-installed" \
--no-reboot # skip reboot (risk: inconsistent filesystem)
Boot order: 1. Instance launches from AMI 2. Cloud-init runs (reads user data, configures SSH keys, sets hostname) 3. User data script executes (if provided) 4. Instance reaches "running" state
3. Instance Store vs EBS¶
This is one of the most critical distinctions in EC2.
EBS (Elastic Block Store): network-attached persistent storage. Survives instance stop/start. Can be snapshotted. Can be detached and reattached to another instance.
Instance Store: physically attached NVMe SSDs on the host. Extremely fast but ephemeral — data is lost when the instance is stopped, terminated, or the underlying host fails.
War story: A team ran a self-managed Elasticsearch cluster on i3 instances for the local NVMe performance. When AWS performed scheduled maintenance and stopped the instances, all data on the instance store volumes vanished. They had no replicas configured because "we had three nodes." All three were on the same maintenance schedule. The cluster was empty when it came back up.
Instance Store: EBS:
├── Blazing fast (local NVMe) ├── Persistent (survives stop)
├── Free (included in price) ├── Costs per GB/month + IOPS
├── DATA LOST on stop/terminate ├── Snapshots for backup
├── Cannot be detached ├── Can resize, change type
└── Fixed size per instance type └── Up to 64 TiB per volume
EBS volume types:
| Type | IOPS | Throughput | Use Case |
|---|---|---|---|
| gp3 | 3,000 base (up to 16,000) | 125 MiB/s (up to 1,000) | Default for most workloads |
| io2 | Up to 64,000 | Up to 1,000 MiB/s | Databases needing consistent IOPS |
| st1 | Baseline 40 MiB/s per TiB | Up to 500 MiB/s | Sequential big data, log processing |
| sc1 | Baseline 12 MiB/s per TiB | Up to 250 MiB/s | Cold storage, infrequent access |
# Create a gp3 volume with custom IOPS and throughput
aws ec2 create-volume \
--volume-type gp3 \
--size 100 \
--iops 6000 \
--throughput 400 \
--availability-zone us-east-1a
# Attach to instance
aws ec2 attach-volume \
--volume-id vol-abc123 \
--instance-id i-abc123 \
--device /dev/xvdf
# Modify volume (resize or change type — online, no downtime)
aws ec2 modify-volume --volume-id vol-abc123 --size 200 --iops 8000
4. Key Pairs and SSH Access¶
EC2 uses SSH key pairs for Linux access. AWS stores the public key; you keep the private key.
# Create a key pair
aws ec2 create-key-pair --key-name prod-key \
--query 'KeyMaterial' --output text > prod-key.pem
chmod 400 prod-key.pem
# SSH to instance
ssh -i prod-key.pem ec2-user@<public-ip>
Better alternatives to key pairs:
-
EC2 Instance Connect: push a temporary SSH key for 60 seconds
-
SSM Session Manager: no SSH, no open ports, no key management
5. User Data Scripts¶
User data runs once at first boot (by default) or every boot (with cloud-init directives). Used for bootstrapping.
#!/bin/bash
# User data example: install and start nginx
yum update -y
yum install -y nginx
systemctl enable nginx
systemctl start nginx
# Write app config from instance metadata
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 300")
INSTANCE_ID=$(curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-id)
echo "INSTANCE_ID=$INSTANCE_ID" >> /etc/app.conf
# Launch an instance with user data
aws ec2 run-instances \
--image-id ami-abc123 \
--instance-type m7g.large \
--key-name prod-key \
--security-group-ids sg-web \
--subnet-id subnet-pub1a \
--user-data file://bootstrap.sh \
--iam-instance-profile Name=ec2-app-profile
# Retrieve user data from a running instance (base64 encoded)
aws ec2 describe-instance-attribute \
--instance-id i-abc123 --attribute userData \
--query 'UserData.Value' --output text | base64 -d
6. Instance Metadata Service (IMDSv2)¶
The metadata service at 169.254.169.254 provides instance information, security credentials, and user data. Always use IMDSv2 (token-required) — IMDSv1 (no token) is vulnerable to SSRF attacks.
Under the hood: The metadata service IP
169.254.169.254is a link-local address. It is not a real server on the network — the hypervisor (Nitro) intercepts packets to this address and responds directly. This is why it works even without a default gateway configured. The Capital One breach of 2019 exploited IMDSv1 via SSRF to steal IAM credentials from this endpoint, which led AWS to create IMDSv2.
# IMDSv2: get a session token first
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
# Then use the token for all requests
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-id
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-type
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/placement/availability-zone
# IAM role credentials (temporary, auto-rotated)
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/ec2-app-role
# Enforce IMDSv2 (disable IMDSv1) — do this on all instances
aws ec2 modify-instance-metadata-options \
--instance-id i-abc123 \
--http-tokens required \
--http-endpoint enabled
7. Placement Groups¶
Placement groups control how instances are physically placed on hardware.
| Type | Behavior | Use Case |
|---|---|---|
| Cluster | All instances on the same rack | HPC, low-latency networking |
| Spread | Each instance on different hardware | Critical instances that must survive hardware failure |
| Partition | Groups of instances on separate racks | Large distributed systems (HDFS, Cassandra) |
8. Pricing Models¶
| Model | Commitment | Discount | Best For |
|---|---|---|---|
| On-Demand | None | 0% | Short-term, unpredictable workloads |
| Reserved | 1 or 3 years | Up to 72% | Steady-state, predictable workloads |
| Savings Plans | $/hour commitment | Up to 72% | Flexible across instance families/regions |
| Spot | None (can be reclaimed) | Up to 90% | Fault-tolerant, flexible workloads |
Spot instances are spare capacity sold at a discount. AWS can reclaim them with 2-minute notice.
Fun fact: Spot instances were originally an auction model (you bid a max price). AWS changed to a flat discount model in November 2017 — prices now fluctuate based on supply/demand but you no longer set a bid. The
--spot-priceparameter still exists but acts as a ceiling, not a bid.
# Request spot instances
aws ec2 request-spot-instances \
--spot-price "0.05" \
--instance-count 5 \
--type "one-time" \
--launch-specification '{
"ImageId": "ami-abc123",
"InstanceType": "m7g.large",
"SecurityGroupIds": ["sg-abc123"],
"SubnetId": "subnet-priv1a"
}'
# Check spot pricing history
aws ec2 describe-spot-price-history \
--instance-types m7g.large \
--start-time $(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S) \
--product-descriptions "Linux/UNIX" \
--query 'SpotPriceHistory[].[AvailabilityZone,SpotPrice]' \
--output table
9. Auto Scaling Groups (ASG)¶
ASGs maintain a fleet of instances, automatically scaling based on demand.
# Create launch template (replaces launch configurations)
aws ec2 create-launch-template \
--launch-template-name app-template \
--version-description "v1" \
--launch-template-data '{
"ImageId": "ami-abc123",
"InstanceType": "m7g.large",
"SecurityGroupIds": ["sg-app"],
"IamInstanceProfile": {"Name": "ec2-app-profile"},
"UserData": "'$(base64 -w 0 bootstrap.sh)'",
"MetadataOptions": {"HttpTokens": "required"}
}'
# Create auto scaling group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name app-asg \
--launch-template LaunchTemplateName=app-template,Version='$Latest' \
--min-size 2 \
--max-size 10 \
--desired-capacity 3 \
--vpc-zone-identifier "subnet-priv1a,subnet-priv1b" \
--target-group-arns arn:aws:elasticloadbalancing:...:targetgroup/app-tg/...
--health-check-type ELB \
--health-check-grace-period 300
Scaling policies: - Target tracking: maintain a metric at a target value (e.g., CPU at 60%) - Step scaling: add/remove instances based on alarm thresholds - Scheduled: scale at specific times (e.g., scale up before business hours) - Predictive: ML-based forecasting of traffic patterns
# Target tracking policy: keep CPU around 60%
aws autoscaling put-scaling-policy \
--auto-scaling-group-name app-asg \
--policy-name cpu-target \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 60.0,
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}'
10. Instance Lifecycle¶
pending → running → stopping → stopped → pending → running
→ shutting-down → terminated
Key transitions:
- Stop: instance store data LOST, EBS persists, public IP released
- Start: new host, new public IP (unless Elastic IP), instance store empty
- Reboot: same host, same IPs, instance store preserved
- Terminate: everything gone (unless EBS has DeleteOnTermination=false)
- Hibernate: RAM saved to EBS, faster resume (must be pre-configured)
# Stop (EBS-backed only)
aws ec2 stop-instances --instance-ids i-abc123
# Start
aws ec2 start-instances --instance-ids i-abc123
# Reboot (preferred over stop/start when possible)
aws ec2 reboot-instances --instance-ids i-abc123
# Terminate (destructive!)
aws ec2 terminate-instances --instance-ids i-abc123
# Enable termination protection
aws ec2 modify-instance-attribute \
--instance-id i-abc123 \
--disable-api-termination
11. Nitro System¶
Nitro is AWS's custom hypervisor and hardware platform. All modern instance types run on Nitro. Benefits: - Near bare-metal performance (hypervisor offloaded to dedicated hardware) - Enhanced networking (up to 100 Gbps) - EBS-optimized by default - NVMe-based storage interface - Security: hardware root of trust, encrypted memory
If an instance type uses Nitro, EBS volumes appear as /dev/nvme* devices instead of /dev/xvd*.
12. EC2 Instance Connect¶
A safer alternative to managing SSH key pairs. Pushes a temporary public key to the instance metadata for 60 seconds.
# Connect via CLI
aws ec2-instance-connect ssh --instance-id i-abc123
# Or push key and connect manually
aws ec2-instance-connect send-ssh-public-key \
--instance-id i-abc123 \
--instance-os-user ec2-user \
--ssh-public-key file://~/.ssh/id_rsa.pub
# Connect within 60 seconds
ssh -i ~/.ssh/id_rsa ec2-user@<ip>
Key Takeaways¶
- Use Graviton instances (g suffix) for 20-40% better price-performance unless you need x86
- Instance store is ephemeral — data vanishes on stop/terminate/host failure
- gp3 is the default EBS volume type — you can tune IOPS and throughput independently
- Always enforce IMDSv2 (token-required) to prevent SSRF credential theft
- T-family burstable instances have CPU credits — understand baseline vs burst
- SSM Session Manager is preferred over SSH for production access
- Spot instances save up to 90% but can be reclaimed with 2-minute notice
- Auto Scaling Groups with launch templates are the modern standard for fleets
- Stop/start changes the underlying host and public IP; reboot does not
Wiki Navigation¶
Prerequisites¶
- Cloud Ops Basics (Topic Pack, L1)
- AWS Networking (Topic Pack, L1)
Related Content¶
- AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
- AWS Compute Flashcards (CLI) (flashcard_deck, L1) — AWS EC2
- AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS ECS (Topic Pack, L2) — Cloud Deep Dive
- AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS IAM (Topic Pack, L1) — Cloud Deep Dive
- AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
- AWS Networking (Topic Pack, L1) — Cloud Deep Dive
- AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
- AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive