Skip to content

AWS S3 Deep Dive - Street-Level Ops

Real-world workflows for operating S3: bulk transfers, access debugging, cost investigation, and event-driven automation.

aws s3 vs aws s3api

Two CLIs, different purposes.

# aws s3: high-level, human-friendly, handles multipart/recursion automatically
aws s3 cp local-file.tar.gz s3://my-bucket/backups/
aws s3 sync ./logs/ s3://my-bucket/logs/ --delete
aws s3 ls s3://my-bucket/logs/ --recursive --human-readable --summarize

# aws s3api: low-level, 1:1 with the S3 API, full control
aws s3api get-object --bucket my-bucket --key config.yaml --version-id abc123 output.yaml
aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json
aws s3api list-object-versions --bucket my-bucket --prefix config.yaml

Rule of thumb: aws s3 for data movement, aws s3api for bucket configuration and metadata operations.

Bulk Operations

Sync a directory tree

# Upload new/changed files, optionally delete removed ones
aws s3 sync ./build/ s3://my-bucket/static/ \
  --exclude "*.tmp" \
  --exclude ".git/*" \
  --delete \
  --storage-class STANDARD_IA

# Dry run first
aws s3 sync ./build/ s3://my-bucket/static/ --dryrun

# Download everything from S3
aws s3 sync s3://my-bucket/data/ ./local-data/ --exact-timestamps

Recursive copy with filters

# Copy all .log.gz files from a date range
aws s3 cp s3://my-bucket/logs/ ./local-logs/ --recursive \
  --exclude "*" --include "2026/03/*/app.log.gz"

# Copy between buckets (cross-region)
aws s3 cp s3://source-bucket/ s3://dest-bucket/ --recursive \
  --source-region us-east-1 --region eu-west-1

# Copy with metadata override
aws s3 cp s3://my-bucket/assets/ s3://my-bucket/assets/ --recursive \
  --metadata-directive REPLACE \
  --cache-control "max-age=86400" \
  --content-type "application/javascript"

Debugging Access Denied

S3 "Access Denied" is one of the most common and frustrating AWS errors. Three layers can deny you.

Step 1: Confirm your identity

aws sts get-caller-identity
# {
#   "UserId": "AROA...:session-name",
#   "Account": "123456789012",
#   "Arn": "arn:aws:sts::123456789012:assumed-role/my-role/session-name"
# }

Step 2: Check bucket policy

aws s3api get-bucket-policy --bucket target-bucket 2>/dev/null | jq '.Policy | fromjson'
# Look for explicit Deny statements matching your principal
# Look for Condition keys (VPC endpoint, source IP, encryption requirements)

Step 3: Check Block Public Access

aws s3api get-public-access-block --bucket target-bucket
# If BlockPublicPolicy=true and your policy grants public access, it's blocked

Step 4: Check IAM policies on your role/user

# List attached policies
aws iam list-attached-role-policies --role-name my-role
aws iam list-role-policies --role-name my-role  # Inline policies

# Simulate whether your role can perform the action
aws iam simulate-principal-policy \
  --policy-source-arn "arn:aws:iam::123456789012:role/my-role" \
  --action-names "s3:GetObject" \
  --resource-arns "arn:aws:s3:::target-bucket/data/file.csv"

Step 5: Check VPC endpoint policy (if in a VPC)

aws ec2 describe-vpc-endpoints \
  --filters "Name=service-name,Values=com.amazonaws.us-east-1.s3" \
  --query 'VpcEndpoints[].PolicyDocument'

Step 6: Check object ownership (cross-account)

# If the object was uploaded by another account and ACLs are in play
aws s3api get-object-acl --bucket target-bucket --key data/file.csv
aws s3api get-bucket-ownership-controls --bucket target-bucket

Common Access Denied causes

Cause How to identify
Wrong account/role sts get-caller-identity shows wrong principal
Bucket policy explicit Deny Policy has "Effect": "Deny" matching your action
Missing IAM permission simulate-principal-policy returns "implicitDeny"
VPC endpoint policy Only blocks from private subnets, not internet
SSE-KMS key policy Need kms:Decrypt permission on the KMS key
Object owned by other account BucketOwnerEnforced not set, ACL restricts access
Block Public Access Blocks any policy granting public access

Debug clue: When troubleshooting S3 Access Denied, always start with aws sts get-caller-identity. In ~40% of cases, the caller is assuming a different role than expected (a session expired and fell back to the instance profile, or the wrong profile is active in your shell). Confirm identity first, then investigate policies.

Investigating Unexpected Costs

Quick cost check

# Storage usage by storage class
aws s3api list-buckets --query 'Buckets[].Name' --output text | tr '\t' '\n' | while read bucket; do
  echo "=== $bucket ==="
  aws cloudwatch get-metric-statistics \
    --namespace AWS/S3 --metric-name BucketSizeBytes \
    --dimensions Name=BucketName,Value="$bucket" Name=StorageType,Value=StandardStorage \
    --start-time "$(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S)" \
    --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \
    --period 86400 --statistics Average \
    --query 'Datapoints[0].Average' --output text
done

# Number of objects per bucket
aws cloudwatch get-metric-statistics \
  --namespace AWS/S3 --metric-name NumberOfObjects \
  --dimensions Name=BucketName,Value=my-bucket Name=StorageType,Value=AllStorageTypes \
  --start-time "$(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \
  --period 86400 --statistics Average

Common cost surprises

# 1. Incomplete multipart uploads consuming storage
aws s3api list-multipart-uploads --bucket my-bucket \
  --query 'Uploads[?Initiated<`2026-03-01`].[Key,UploadId,Initiated]' --output table

# Abort old incomplete uploads
aws s3api list-multipart-uploads --bucket my-bucket \
  --query 'Uploads[].UploadId' --output text | while read id; do
  aws s3api abort-multipart-upload --bucket my-bucket \
    --key "$KEY" --upload-id "$id"
done

# 2. Old versions consuming storage (versioned bucket)
aws s3api list-object-versions --bucket my-bucket \
  --query 'length(Versions[?IsLatest==`false`])'

# 3. Delete markers piling up
aws s3api list-object-versions --bucket my-bucket \
  --query 'length(DeleteMarkers)'

# 4. LIST requests are expensive at scale ($0.005 per 1000)
#    A recursive aws s3 ls on a bucket with 10M objects generates 10,000 LIST calls = $50
# S3 server access logging shows request types:
aws s3api get-bucket-logging --bucket my-bucket

Setting Up Lifecycle Rules for Cost Optimization

# Template: production logs lifecycle
aws s3api put-bucket-lifecycle-configuration --bucket prod-logs \
  --lifecycle-configuration '{
    "Rules": [
      {
        "ID": "transition-old-logs",
        "Status": "Enabled",
        "Filter": { "Prefix": "logs/" },
        "Transitions": [
          { "Days": 30, "StorageClass": "STANDARD_IA" },
          { "Days": 90, "StorageClass": "GLACIER" }
        ],
        "Expiration": { "Days": 365 }
      },
      {
        "ID": "expire-old-versions",
        "Status": "Enabled",
        "Filter": { "Prefix": "" },
        "NoncurrentVersionTransitions": [
          { "NoncurrentDays": 30, "StorageClass": "GLACIER" }
        ],
        "NoncurrentVersionExpiration": { "NoncurrentDays": 90 }
      },
      {
        "ID": "abort-incomplete-multipart",
        "Status": "Enabled",
        "Filter": { "Prefix": "" },
        "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 3 }
      },
      {
        "ID": "expire-delete-markers",
        "Status": "Enabled",
        "Filter": { "Prefix": "" },
        "Expiration": { "ExpiredObjectDeleteMarker": true }
      }
    ]
  }'

Cross-Account Access Patterns

Pattern 1: Bucket policy grants access to another account's role

# On bucket-owning account: bucket policy
aws s3api put-bucket-policy --bucket shared-data --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "AWS": "arn:aws:iam::987654321098:role/data-reader" },
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": ["arn:aws:s3:::shared-data", "arn:aws:s3:::shared-data/*"]
  }]
}'

# On accessing account: role must also have IAM policy allowing s3:GetObject
# Both sides must allow -- bucket policy AND IAM policy

Pattern 2: S3 Access Points for per-team access

# Create access point with specific policy
aws s3control create-access-point --account-id 123456789012 \
  --name team-analytics --bucket shared-data

# Team accesses via access point ARN
aws s3 ls s3://arn:aws:s3:us-east-1:123456789012:accesspoint/team-analytics/data/

Presigned URL Generation for Temporary Access

# Download link valid for 1 hour
aws s3 presign s3://my-bucket/reports/q1-revenue.pdf --expires-in 3600
# https://my-bucket.s3.amazonaws.com/reports/q1-revenue.pdf?X-Amz-Algorithm=...

# For uploads, use s3api with presigned POST (more control)
python3 -c "
import boto3, json
s3 = boto3.client('s3')
post = s3.generate_presigned_post(
    'my-bucket', 'uploads/\${filename}',
    Conditions=[['content-length-range', 1, 104857600]],
    ExpiresIn=900
)
print(json.dumps(post, indent=2))
"

Large File Upload Strategies

# aws s3 cp handles multipart automatically
# Configure thresholds for optimal performance
aws configure set default.s3.multipart_threshold 64MB
aws configure set default.s3.multipart_chunksize 16MB
aws configure set default.s3.max_concurrent_requests 20

# Upload a 50GB file with progress
aws s3 cp massive-backup.tar.gz s3://my-bucket/backups/ --expected-size 53687091200

# Resume a failed upload (aws s3 cp does NOT resume -- use s3api multipart)
# Step 1: List incomplete uploads
aws s3api list-multipart-uploads --bucket my-bucket

# Step 2: List uploaded parts
aws s3api list-parts --bucket my-bucket --key backups/massive-backup.tar.gz \
  --upload-id "$UPLOAD_ID"

# Step 3: Upload missing parts and complete

S3 as Static Website Hosting

# Enable static website hosting
aws s3 website s3://my-website-bucket/ \
  --index-document index.html --error-document error.html

# Bucket policy for public read (only after disabling Block Public Access)
aws s3api put-bucket-policy --bucket my-website-bucket --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-website-bucket/*"
  }]
}'

# Better: use CloudFront in front of S3 with OAC (Origin Access Control)
# No public bucket access needed -- CloudFront authenticates to S3 directly

Bucket Notifications for Event-Driven Workflows

# Configure notifications to Lambda + EventBridge
aws s3api put-bucket-notification-configuration --bucket uploads-bucket \
  --notification-configuration '{
    "LambdaFunctionConfigurations": [{
      "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:process-upload",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            { "Name": "prefix", "Value": "incoming/" },
            { "Name": "suffix", "Value": ".csv" }
          ]
        }
      }
    }],
    "EventBridgeConfiguration": {}
  }'

# Verify the configuration
aws s3api get-bucket-notification-configuration --bucket uploads-bucket

# Test by uploading a file
aws s3 cp test.csv s3://uploads-bucket/incoming/test.csv

# Check Lambda invocation
aws logs filter-log-events --log-group-name /aws/lambda/process-upload \
  --start-time "$(date -d '5 minutes ago' +%s)000"

S3 Access Logging

# Enable server access logging
aws s3api put-bucket-logging --bucket my-bucket --bucket-logging-status '{
  "LoggingEnabled": {
    "TargetBucket": "access-logs-bucket",
    "TargetPrefix": "s3-logs/my-bucket/"
  }
}'

# Verify logging is enabled
aws s3api get-bucket-logging --bucket my-bucket

Gotcha: S3 server access logs are delivered on a best-effort basis with typical delays of a few minutes to several hours. Do not rely on them for real-time alerting or forensics. For near-real-time access tracking, enable CloudTrail data events for S3 -- they arrive within ~5 minutes and are queryable via Athena. ```text