- cloud
- l1
- topic-pack
- aws-s3-deep-dive
- cloud-deep-dive --- Portal | Level: L1: Foundations | Topics: AWS S3 Deep Dive, Cloud Deep Dive | Domain: Cloud
AWS S3 Deep Dive - Primer¶
Why This Matters¶
S3 is the backbone of AWS. Every service touches it: CloudTrail logs go to S3, Lambda deployment packages come from S3, EMR reads data from S3, backups land in S3. A misconfigured bucket policy can expose your company's data to the internet. A missing lifecycle rule can run up a six-figure storage bill. Understanding S3 deeply -- the object model, storage classes, security layers, and performance characteristics -- is foundational cloud operations knowledge.
S3 Object Model¶
S3 is a key-value object store, not a filesystem. There are no directories -- the / in a key is just a character.
Core concepts¶
| Concept | Description |
|---|---|
| Bucket | Globally unique container. Region-specific. |
| Key | The full "path" to an object (e.g., logs/2026/03/19/app.log.gz) |
| Object | The data + metadata. Max 5TB per object. |
| Metadata | System metadata (Content-Type, Last-Modified) + user metadata (x-amz-meta-*) |
| Version ID | Unique identifier for each version of an object (when versioning is enabled) |
| ETag | Hash of the object (MD5 for non-multipart uploads) |
# List objects in a bucket
aws s3 ls s3://my-bucket/logs/2026/03/ --recursive
# Get object metadata without downloading
aws s3api head-object --bucket my-bucket --key logs/2026/03/19/app.log.gz
# Output includes: ContentLength, ContentType, ETag, LastModified, StorageClass, VersionId
Strong consistency model¶
As of December 2020, S3 provides strong read-after-write consistency for all operations:
- PUT a new object -> immediately readable
- PUT overwrite -> immediately returns the new version
- DELETE -> immediately reflects the deletion
- LIST -> immediately reflects recent writes and deletes
This replaced the previous eventual consistency model. You no longer need to worry about stale reads after writes.
Storage Classes¶
Choosing the right storage class is the biggest lever for S3 cost optimization.
| Class | Access Pattern | Availability | Min Duration | Retrieval Cost |
|---|---|---|---|---|
| Standard | Frequently accessed | 99.99% | None | None |
| Intelligent-Tiering | Unknown/changing | 99.9% | None | None (monitoring fee) |
| Standard-IA | Infrequent, immediate access | 99.9% | 30 days | Per-GB retrieval |
| One Zone-IA | Infrequent, one AZ is fine | 99.5% | 30 days | Per-GB retrieval |
| Glacier Instant Retrieval | Quarterly access, immediate | 99.9% | 90 days | Per-GB retrieval |
| Glacier Flexible Retrieval | 1-2 times/year, minutes-hours | 99.99% | 90 days | Per-GB + per-request |
| Glacier Deep Archive | Compliance/archival, rare | 99.99% | 180 days | Per-GB + per-request |
# Upload to a specific storage class
aws s3 cp backup.tar.gz s3://my-bucket/backups/ --storage-class GLACIER
# Check current storage class
aws s3api head-object --bucket my-bucket --key backups/backup.tar.gz \
--query 'StorageClass'
# Change storage class (creates a copy)
aws s3 cp s3://my-bucket/data/file.csv s3://my-bucket/data/file.csv \
--storage-class STANDARD_IA
Intelligent-Tiering¶
Automatically moves objects between access tiers based on usage. No retrieval fees, but a small monthly monitoring fee per object. Good when you cannot predict access patterns.
# Upload with Intelligent-Tiering
aws s3 cp data.parquet s3://my-bucket/analytics/ --storage-class INTELLIGENT_TIERING
Tiers: Frequent Access -> Infrequent Access (30 days) -> Archive Instant Access (90 days) -> Archive Access (opt-in, 90 days) -> Deep Archive Access (opt-in, 180 days).
Lifecycle Policies¶
Automate transitions between storage classes and expiration of old objects.
# Apply a lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration '{
"Rules": [
{
"ID": "archive-old-logs",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 }
},
{
"ID": "cleanup-incomplete-uploads",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
}
]
}'
# View current lifecycle rules
aws s3api get-bucket-lifecycle-configuration --bucket my-bucket
Versioning¶
Versioning keeps all versions of every object. Deletes do not remove the object -- they add a delete marker.
# Enable versioning
aws s3api put-bucket-versioning --bucket my-bucket \
--versioning-configuration Status=Enabled
# Check versioning status
aws s3api get-bucket-versioning --bucket my-bucket
# List all versions of an object
aws s3api list-object-versions --bucket my-bucket --prefix config/app.yaml
# Retrieve a specific version
aws s3api get-object --bucket my-bucket --key config/app.yaml \
--version-id "abc123xyz" old-app.yaml
# Permanently delete a specific version (bypasses delete marker)
aws s3api delete-object --bucket my-bucket --key config/app.yaml \
--version-id "abc123xyz"
MFA Delete¶
Requires MFA authentication to delete object versions or change versioning state. Only the root account can enable it.
# Enable MFA Delete (must be root account, with MFA device)
aws s3api put-bucket-versioning --bucket my-bucket \
--versioning-configuration Status=Enabled,MFADelete=Enabled \
--mfa "arn:aws:iam::123456789012:mfa/root-device 123456"
Bucket Policies, ACLs, and IAM¶
Three layers of access control. Evaluation order: explicit deny wins, then bucket policy, then IAM, then ACLs.
Bucket policy (resource-based)¶
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencryptedUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
},
{
"Sid": "AllowCrossAccountRead",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::987654321098:root" },
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
]
}
# Apply bucket policy
aws s3api put-bucket-policy --bucket my-bucket --policy file://policy.json
# View bucket policy
aws s3api get-bucket-policy --bucket my-bucket | jq '.Policy | fromjson'
Block Public Access (account-wide and per-bucket)¶
# Check public access block settings
aws s3api get-public-access-block --bucket my-bucket
# Enable all public access blocks
aws s3api put-public-access-block --bucket my-bucket \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
# Account-level (applies to all buckets)
aws s3control put-public-access-block --account-id 123456789012 \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
ACLs (legacy, avoid)¶
ACLs are the old access control model. AWS recommends disabling them entirely via the "Bucket owner enforced" object ownership setting.
# Disable ACLs (recommended)
aws s3api put-bucket-ownership-controls --bucket my-bucket \
--ownership-controls '{"Rules": [{"ObjectOwnership": "BucketOwnerEnforced"}]}'
S3 Encryption¶
Server-side encryption options¶
| Type | Key Management | When to Use |
|---|---|---|
| SSE-S3 | AWS-managed keys | Default, simplest |
| SSE-KMS | AWS KMS (your CMK or aws/s3 key) | Audit trail, key rotation, cross-account |
| SSE-C | Customer-provided key per request | You manage keys entirely |
| Client-side | Encrypt before upload | End-to-end, AWS never sees plaintext |
# Default encryption for bucket (SSE-S3)
aws s3api put-bucket-encryption --bucket my-bucket \
--server-side-encryption-configuration '{
"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]
}'
# Default encryption (SSE-KMS)
aws s3api put-bucket-encryption --bucket my-bucket \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "arn:aws:kms:us-east-1:123456789012:key/my-key-id"
},
"BucketKeyEnabled": true
}]
}'
# BucketKeyEnabled reduces KMS request costs by using a bucket-level key
Presigned URLs¶
Generate temporary URLs that grant time-limited access to private objects without requiring AWS credentials.
# Generate presigned URL for download (default 1 hour)
aws s3 presign s3://my-bucket/reports/quarterly.pdf --expires-in 3600
# Generate presigned URL for upload
aws s3 presign s3://my-bucket/uploads/incoming.csv --expires-in 900
import boto3
s3 = boto3.client("s3")
# Download URL
url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": "my-bucket", "Key": "reports/quarterly.pdf"},
ExpiresIn=3600,
)
# Upload URL
url = s3.generate_presigned_url(
"put_object",
Params={"Bucket": "my-bucket", "Key": "uploads/incoming.csv", "ContentType": "text/csv"},
ExpiresIn=900,
)
Multipart Uploads¶
Required for objects larger than 5GB. Recommended for anything over 100MB. Allows parallel upload of parts and resumability.
# aws s3 cp automatically uses multipart for large files
aws s3 cp large-backup.tar.gz s3://my-bucket/backups/ \
--expected-size 10737418240
# Manual multipart with s3api
# Step 1: Initiate
UPLOAD_ID=$(aws s3api create-multipart-upload --bucket my-bucket \
--key backups/large.tar.gz --query 'UploadId' --output text)
# Step 2: Upload parts
aws s3api upload-part --bucket my-bucket --key backups/large.tar.gz \
--part-number 1 --body part1.bin --upload-id "$UPLOAD_ID"
# Step 3: Complete
aws s3api complete-multipart-upload --bucket my-bucket \
--key backups/large.tar.gz --upload-id "$UPLOAD_ID" \
--multipart-upload '{"Parts": [{"PartNumber": 1, "ETag": "\"abc123\""}]}'
# List incomplete multipart uploads (these cost storage!)
aws s3api list-multipart-uploads --bucket my-bucket
S3 Transfer Acceleration¶
Uses CloudFront edge locations to speed up uploads from distant clients. Adds a per-GB charge.
# Enable Transfer Acceleration
aws s3api put-bucket-accelerate-configuration --bucket my-bucket \
--accelerate-configuration Status=Enabled
# Upload using acceleration
aws s3 cp large-file.gz s3://my-bucket/data/ --region us-east-1 \
--endpoint-url https://my-bucket.s3-accelerate.amazonaws.com
Replication¶
Cross-Region Replication (CRR) and Same-Region Replication (SRR)¶
# Requires versioning on both source and destination buckets
aws s3api put-bucket-replication --bucket source-bucket \
--replication-configuration '{
"Role": "arn:aws:iam::123456789012:role/s3-replication-role",
"Rules": [{
"ID": "replicate-all",
"Status": "Enabled",
"Filter": {},
"Destination": {
"Bucket": "arn:aws:s3:::dest-bucket",
"StorageClass": "STANDARD_IA"
},
"DeleteMarkerReplication": { "Status": "Enabled" }
}]
}'
Replication is asynchronous. It does not replicate existing objects (use S3 Batch Operations for that). It does not replicate objects encrypted with SSE-C.
S3 Event Notifications¶
Trigger actions on object events (create, delete, restore).
# Send notifications to SQS on object creation
aws s3api put-bucket-notification-configuration --bucket my-bucket \
--notification-configuration '{
"QueueConfigurations": [{
"QueueArn": "arn:aws:sqs:us-east-1:123456789012:s3-events",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": { "FilterRules": [{"Name": "prefix", "Value": "uploads/"}] }
}
}],
"EventBridgeConfiguration": {}
}'
Targets: Lambda, SQS, SNS, EventBridge. EventBridge is the most flexible (filter on metadata, route to many targets).
S3 Select and S3 Object Lock¶
S3 Select¶
Query CSV, JSON, or Parquet objects in-place without downloading the full object.
# Query a CSV file
aws s3api select-object-content \
--bucket my-bucket --key data/sales.csv \
--expression "SELECT s.product, s.revenue FROM s3object s WHERE s.revenue > '10000'" \
--expression-type SQL \
--input-serialization '{"CSV": {"FileHeaderInfo": "USE"}}' \
--output-serialization '{"CSV": {}}' output.csv
S3 Object Lock (WORM)¶
Write Once Read Many. Prevents object deletion or overwrite for a retention period. Required for regulatory compliance (SEC, FINRA, HIPAA).
# Enable Object Lock (must be set at bucket creation)
aws s3api create-bucket --bucket compliance-bucket \
--object-lock-enabled-for-bucket
# Set default retention
aws s3api put-object-lock-configuration --bucket compliance-bucket \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": { "Mode": "COMPLIANCE", "Years": 7 }
}
}'
Two modes: Governance (can be overridden with special permission) and Compliance (cannot be overridden by anyone, including root).
Performance Optimization¶
S3 supports at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per partitioned prefix.
Prefix partitioning¶
# BAD: all objects under one prefix
s3://bucket/data/file001.csv
s3://bucket/data/file002.csv
# GOOD: distribute across prefixes for high throughput
s3://bucket/data/aa/file001.csv
s3://bucket/data/ab/file002.csv
# Or use date-based partitioning
s3://bucket/data/2026/03/19/file001.csv
S3 Inventory¶
Scheduled report of all objects in a bucket. Much faster than LIST for large buckets.
aws s3api put-bucket-inventory-configuration --bucket my-bucket \
--id weekly-inventory \
--inventory-configuration '{
"Id": "weekly-inventory",
"IsEnabled": true,
"Destination": {
"S3BucketDestination": {
"Bucket": "arn:aws:s3:::inventory-bucket",
"Format": "CSV",
"AccountId": "123456789012"
}
},
"Schedule": { "Frequency": "Weekly" },
"IncludedObjectVersions": "Current",
"OptionalFields": ["Size", "StorageClass", "LastModifiedDate", "EncryptionStatus"]
}'
S3 Storage Lens¶
Account-level and organization-level analytics dashboard for storage usage, cost, and activity metrics. Enable via the S3 console or aws s3control put-storage-lens-configuration. Provides daily metrics on storage, requests, errors, and cost optimization opportunities across all buckets.
Wiki Navigation¶
Prerequisites¶
- Cloud Ops Basics (Topic Pack, L1)
- AWS IAM (Topic Pack, L1)
Related Content¶
- AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
- AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS EC2 (Topic Pack, L1) — Cloud Deep Dive
- AWS ECS (Topic Pack, L2) — Cloud Deep Dive
- AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS IAM (Topic Pack, L1) — Cloud Deep Dive
- AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
- AWS Networking (Topic Pack, L1) — Cloud Deep Dive
- AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
- AWS Storage Flashcards (CLI) (flashcard_deck, L1) — AWS S3 Deep Dive