Skip to content

Portal | Level: L1: Foundations | Topics: AWS EC2, Cloud Deep Dive | Domain: Cloud

AWS EC2 - Primer

Why This Matters

EC2 is the workhorse of AWS. Even in a container-first, serverless-leaning world, EC2 instances underpin EKS worker nodes, RDS databases, ElastiCache clusters, and anything else that needs a virtual machine. Understanding instance types, storage options, networking behavior, and pricing models directly impacts your ability to run reliable, cost-effective infrastructure.

When an instance is unreachable, when CPU credits run out at 2 AM, when you lose data because you did not understand the difference between instance store and EBS — that is when EC2 knowledge pays for itself.

Core Concepts

1. Instance Types and Families

Remember: Instance family letter mnemonic: Most workloads (general), Compute, RAM (memory), Tiny-burst (burstable), I/O (storage), Parallel (GPU). The letter tells you what the instance is optimized for.

Every instance type follows the naming convention: <family><generation>.<size>

m7g.xlarge
│││  └── Size: xlarge (4 vCPU, 16 GiB)
││└──── Generation: 7th
│└───── Family: m (general purpose)
└────── (optional) Processor: g = Graviton

Instance families:

Family Optimized For Examples
m (general) Balanced CPU/memory Web servers, app servers, dev/test
c (compute) CPU-intensive Batch processing, ML inference, gaming
r (memory) Memory-intensive Databases, in-memory caches, analytics
i/d (storage) High I/O, local NVMe Databases needing IOPS, data warehousing
t (burstable) Variable workloads Dev/test, small databases, microservices
p/g (accelerated) GPU workloads ML training, video encoding, HPC
# List all instance types available in your region
aws ec2 describe-instance-types \
  --query 'InstanceTypes[].{Type:InstanceType,vCPU:VCpuInfo.DefaultVCpus,MemGB:MemoryInfo.SizeInMiB}' \
  --filters "Name=instance-type,Values=m7*" \
  --output table

# Check pricing (use the pricing API or the website)
aws pricing get-products \
  --service-code AmazonEC2 \
  --filters "Type=TERM_MATCH,Field=instanceType,Value=m7g.xlarge" \
  --region us-east-1

Graviton instances (suffix g): ARM-based, 20-40% better price-performance than x86 for most workloads. Use them unless your software requires x86.

2. AMIs and the Boot Process

An Amazon Machine Image (AMI) is a snapshot of a root volume plus metadata (kernel, block device mapping, permissions). Every instance launches from an AMI.

# Find the latest Amazon Linux 2023 AMI
aws ec2 describe-images \
  --owners amazon \
  --filters "Name=name,Values=al2023-ami-2023*-x86_64" \
  --query 'sort_by(Images, &CreationDate)[-1].{ID:ImageId,Name:Name,Date:CreationDate}' \
  --output table

# Find the latest Ubuntu 22.04 AMI
aws ec2 describe-images \
  --owners 099720109477 \
  --filters "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
  --query 'sort_by(Images, &CreationDate)[-1].{ID:ImageId,Name:Name}'

# Create your own AMI from a running instance
aws ec2 create-image \
  --instance-id i-abc123 \
  --name "app-server-$(date +%Y%m%d)" \
  --description "App server with deps pre-installed" \
  --no-reboot  # skip reboot (risk: inconsistent filesystem)

Boot order: 1. Instance launches from AMI 2. Cloud-init runs (reads user data, configures SSH keys, sets hostname) 3. User data script executes (if provided) 4. Instance reaches "running" state

3. Instance Store vs EBS

This is one of the most critical distinctions in EC2.

EBS (Elastic Block Store): network-attached persistent storage. Survives instance stop/start. Can be snapshotted. Can be detached and reattached to another instance.

Instance Store: physically attached NVMe SSDs on the host. Extremely fast but ephemeral — data is lost when the instance is stopped, terminated, or the underlying host fails.

War story: A team ran a self-managed Elasticsearch cluster on i3 instances for the local NVMe performance. When AWS performed scheduled maintenance and stopped the instances, all data on the instance store volumes vanished. They had no replicas configured because "we had three nodes." All three were on the same maintenance schedule. The cluster was empty when it came back up.

Instance Store:                    EBS:
├── Blazing fast (local NVMe)      ├── Persistent (survives stop)
├── Free (included in price)       ├── Costs per GB/month + IOPS
├── DATA LOST on stop/terminate    ├── Snapshots for backup
├── Cannot be detached             ├── Can resize, change type
└── Fixed size per instance type   └── Up to 64 TiB per volume

EBS volume types:

Type IOPS Throughput Use Case
gp3 3,000 base (up to 16,000) 125 MiB/s (up to 1,000) Default for most workloads
io2 Up to 64,000 Up to 1,000 MiB/s Databases needing consistent IOPS
st1 Baseline 40 MiB/s per TiB Up to 500 MiB/s Sequential big data, log processing
sc1 Baseline 12 MiB/s per TiB Up to 250 MiB/s Cold storage, infrequent access
# Create a gp3 volume with custom IOPS and throughput
aws ec2 create-volume \
  --volume-type gp3 \
  --size 100 \
  --iops 6000 \
  --throughput 400 \
  --availability-zone us-east-1a

# Attach to instance
aws ec2 attach-volume \
  --volume-id vol-abc123 \
  --instance-id i-abc123 \
  --device /dev/xvdf

# Modify volume (resize or change type — online, no downtime)
aws ec2 modify-volume --volume-id vol-abc123 --size 200 --iops 8000

4. Key Pairs and SSH Access

EC2 uses SSH key pairs for Linux access. AWS stores the public key; you keep the private key.

# Create a key pair
aws ec2 create-key-pair --key-name prod-key \
  --query 'KeyMaterial' --output text > prod-key.pem
chmod 400 prod-key.pem

# SSH to instance
ssh -i prod-key.pem ec2-user@<public-ip>

Better alternatives to key pairs:

  1. EC2 Instance Connect: push a temporary SSH key for 60 seconds

    aws ec2-instance-connect send-ssh-public-key \
      --instance-id i-abc123 \
      --availability-zone us-east-1a \
      --instance-os-user ec2-user \
      --ssh-public-key file://~/.ssh/id_rsa.pub
    

  2. SSM Session Manager: no SSH, no open ports, no key management

    aws ssm start-session --target i-abc123
    

5. User Data Scripts

User data runs once at first boot (by default) or every boot (with cloud-init directives). Used for bootstrapping.

#!/bin/bash
# User data example: install and start nginx
yum update -y
yum install -y nginx
systemctl enable nginx
systemctl start nginx

# Write app config from instance metadata
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 300")
INSTANCE_ID=$(curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id)
echo "INSTANCE_ID=$INSTANCE_ID" >> /etc/app.conf
# Launch an instance with user data
aws ec2 run-instances \
  --image-id ami-abc123 \
  --instance-type m7g.large \
  --key-name prod-key \
  --security-group-ids sg-web \
  --subnet-id subnet-pub1a \
  --user-data file://bootstrap.sh \
  --iam-instance-profile Name=ec2-app-profile

# Retrieve user data from a running instance (base64 encoded)
aws ec2 describe-instance-attribute \
  --instance-id i-abc123 --attribute userData \
  --query 'UserData.Value' --output text | base64 -d

6. Instance Metadata Service (IMDSv2)

The metadata service at 169.254.169.254 provides instance information, security credentials, and user data. Always use IMDSv2 (token-required) — IMDSv1 (no token) is vulnerable to SSRF attacks.

Under the hood: The metadata service IP 169.254.169.254 is a link-local address. It is not a real server on the network — the hypervisor (Nitro) intercepts packets to this address and responds directly. This is why it works even without a default gateway configured. The Capital One breach of 2019 exploited IMDSv1 via SSRF to steal IAM credentials from this endpoint, which led AWS to create IMDSv2.

# IMDSv2: get a session token first
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Then use the token for all requests
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-type

curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/placement/availability-zone

# IAM role credentials (temporary, auto-rotated)
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/ec2-app-role
# Enforce IMDSv2 (disable IMDSv1) — do this on all instances
aws ec2 modify-instance-metadata-options \
  --instance-id i-abc123 \
  --http-tokens required \
  --http-endpoint enabled

7. Placement Groups

Placement groups control how instances are physically placed on hardware.

Type Behavior Use Case
Cluster All instances on the same rack HPC, low-latency networking
Spread Each instance on different hardware Critical instances that must survive hardware failure
Partition Groups of instances on separate racks Large distributed systems (HDFS, Cassandra)
aws ec2 create-placement-group \
  --group-name hpc-cluster \
  --strategy cluster

8. Pricing Models

Model Commitment Discount Best For
On-Demand None 0% Short-term, unpredictable workloads
Reserved 1 or 3 years Up to 72% Steady-state, predictable workloads
Savings Plans $/hour commitment Up to 72% Flexible across instance families/regions
Spot None (can be reclaimed) Up to 90% Fault-tolerant, flexible workloads

Spot instances are spare capacity sold at a discount. AWS can reclaim them with 2-minute notice.

Fun fact: Spot instances were originally an auction model (you bid a max price). AWS changed to a flat discount model in November 2017 — prices now fluctuate based on supply/demand but you no longer set a bid. The --spot-price parameter still exists but acts as a ceiling, not a bid.

# Request spot instances
aws ec2 request-spot-instances \
  --spot-price "0.05" \
  --instance-count 5 \
  --type "one-time" \
  --launch-specification '{
    "ImageId": "ami-abc123",
    "InstanceType": "m7g.large",
    "SecurityGroupIds": ["sg-abc123"],
    "SubnetId": "subnet-priv1a"
  }'

# Check spot pricing history
aws ec2 describe-spot-price-history \
  --instance-types m7g.large \
  --start-time $(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S) \
  --product-descriptions "Linux/UNIX" \
  --query 'SpotPriceHistory[].[AvailabilityZone,SpotPrice]' \
  --output table

9. Auto Scaling Groups (ASG)

ASGs maintain a fleet of instances, automatically scaling based on demand.

# Create launch template (replaces launch configurations)
aws ec2 create-launch-template \
  --launch-template-name app-template \
  --version-description "v1" \
  --launch-template-data '{
    "ImageId": "ami-abc123",
    "InstanceType": "m7g.large",
    "SecurityGroupIds": ["sg-app"],
    "IamInstanceProfile": {"Name": "ec2-app-profile"},
    "UserData": "'$(base64 -w 0 bootstrap.sh)'",
    "MetadataOptions": {"HttpTokens": "required"}
  }'

# Create auto scaling group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name app-asg \
  --launch-template LaunchTemplateName=app-template,Version='$Latest' \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 3 \
  --vpc-zone-identifier "subnet-priv1a,subnet-priv1b" \
  --target-group-arns arn:aws:elasticloadbalancing:...:targetgroup/app-tg/...
  --health-check-type ELB \
  --health-check-grace-period 300

Scaling policies: - Target tracking: maintain a metric at a target value (e.g., CPU at 60%) - Step scaling: add/remove instances based on alarm thresholds - Scheduled: scale at specific times (e.g., scale up before business hours) - Predictive: ML-based forecasting of traffic patterns

# Target tracking policy: keep CPU around 60%
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name app-asg \
  --policy-name cpu-target \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 60.0,
    "ScaleInCooldown": 300,
    "ScaleOutCooldown": 60
  }'

10. Instance Lifecycle

pending → running → stopping → stopped → pending → running
                  → shutting-down → terminated

Key transitions:
- Stop:      instance store data LOST, EBS persists, public IP released
- Start:     new host, new public IP (unless Elastic IP), instance store empty
- Reboot:    same host, same IPs, instance store preserved
- Terminate: everything gone (unless EBS has DeleteOnTermination=false)
- Hibernate: RAM saved to EBS, faster resume (must be pre-configured)
# Stop (EBS-backed only)
aws ec2 stop-instances --instance-ids i-abc123

# Start
aws ec2 start-instances --instance-ids i-abc123

# Reboot (preferred over stop/start when possible)
aws ec2 reboot-instances --instance-ids i-abc123

# Terminate (destructive!)
aws ec2 terminate-instances --instance-ids i-abc123

# Enable termination protection
aws ec2 modify-instance-attribute \
  --instance-id i-abc123 \
  --disable-api-termination

11. Nitro System

Nitro is AWS's custom hypervisor and hardware platform. All modern instance types run on Nitro. Benefits: - Near bare-metal performance (hypervisor offloaded to dedicated hardware) - Enhanced networking (up to 100 Gbps) - EBS-optimized by default - NVMe-based storage interface - Security: hardware root of trust, encrypted memory

If an instance type uses Nitro, EBS volumes appear as /dev/nvme* devices instead of /dev/xvd*.

12. EC2 Instance Connect

A safer alternative to managing SSH key pairs. Pushes a temporary public key to the instance metadata for 60 seconds.

# Connect via CLI
aws ec2-instance-connect ssh --instance-id i-abc123

# Or push key and connect manually
aws ec2-instance-connect send-ssh-public-key \
  --instance-id i-abc123 \
  --instance-os-user ec2-user \
  --ssh-public-key file://~/.ssh/id_rsa.pub

# Connect within 60 seconds
ssh -i ~/.ssh/id_rsa ec2-user@<ip>

Key Takeaways

  • Use Graviton instances (g suffix) for 20-40% better price-performance unless you need x86
  • Instance store is ephemeral — data vanishes on stop/terminate/host failure
  • gp3 is the default EBS volume type — you can tune IOPS and throughput independently
  • Always enforce IMDSv2 (token-required) to prevent SSRF credential theft
  • T-family burstable instances have CPU credits — understand baseline vs burst
  • SSM Session Manager is preferred over SSH for production access
  • Spot instances save up to 90% but can be reclaimed with 2-minute notice
  • Auto Scaling Groups with launch templates are the modern standard for fleets
  • Stop/start changes the underlying host and public IP; reboot does not

Wiki Navigation

Prerequisites

  • AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
  • AWS Compute Flashcards (CLI) (flashcard_deck, L1) — AWS EC2
  • AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS ECS (Topic Pack, L2) — Cloud Deep Dive
  • AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS IAM (Topic Pack, L1) — Cloud Deep Dive
  • AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
  • AWS Networking (Topic Pack, L1) — Cloud Deep Dive
  • AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
  • AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive