Portal | Level: L1: Foundations | Topics: S3-Compatible Object Storage | Domain: DevOps & Tooling
S3-Compatible Object Storage — Primer¶
Why This Matters¶
Object storage (S3, MinIO, Ceph RGW) is the standard backend for artifact storage, backups, log archiving, data lake landing zones, and modern infrastructure components like Loki, Thanos, Velero, and Harbor. Understanding the S3 API, how to operate MinIO, how to configure access policies, and how to tune for performance lets you confidently run object storage anywhere — on-prem, in Kubernetes, or in the cloud — and integrate it with any S3-aware tool.
Core Concepts¶
1. Object Storage Model¶
Name origin: S3 stands for "Simple Storage Service" — launched by AWS in March 2006, it was one of the first commercially available cloud services. Its HTTP-based REST API became the de facto standard, which is why MinIO, Ceph RGW, and dozens of other implementations speak "S3-compatible."
Object storage has no filesystem hierarchy. There are no directories — only buckets and objects. "Folders" in S3-compatible UIs are just a naming convention where key prefixes contain /.
Bucket: my-artifacts
Objects:
builds/2024/01/15/app-v1.2.3.tar.gz ← full object key
builds/2024/01/15/checksums.sha256
backups/daily/2024-01-15.tar.gz
logs/app/2024-01-15/access.log.gz
Key concepts:
- Bucket: flat namespace container. Globally unique name within the endpoint.
- Object: a blob of bytes (0 bytes to 5 TB for multipart) plus metadata.
- Key: the full path string that identifies an object within a bucket. / has no special filesystem meaning — it's just a character.
- Metadata: HTTP headers attached to objects (Content-Type, custom x-amz-meta- headers, ETag, Last-Modified).
- ETag*: MD5 of the object (single-part) or MD5 of concatenated part ETags (multipart). Use for integrity verification.
No directories means: you can't atomically rename a "directory" — you must copy all objects with the new prefix then delete the old ones. This is an important operational constraint.
2. S3 API Compatibility¶
The S3 API is the de facto standard for object storage. Major operations:
Bucket operations:
PUT /bucket → CreateBucket
DELETE /bucket → DeleteBucket
GET /?list-type=2 → ListBuckets
GET /bucket?list-type=2 → ListObjectsV2
Object operations:
PUT /bucket/key → PutObject
GET /bucket/key → GetObject
HEAD /bucket/key → HeadObject (metadata only)
DELETE /bucket/key → DeleteObject
COPY source to dest → CopyObject
Multipart:
POST /bucket/key?uploads → CreateMultipartUpload
PUT /bucket/key?partNumber=N&uploadId=X → UploadPart
POST /bucket/key?uploadId=X → CompleteMultipartUpload
DELETE /bucket/key?uploadId=X → AbortMultipartUpload
Pre-signed URLs:
Temporary signed GET/PUT URLs with expiry
What matters for portability: Any tool using the AWS SDK or boto3 works with MinIO, Ceph RGW, or other S3-compatible endpoints by setting:
import boto3
s3 = boto3.client('s3',
endpoint_url='http://minio.example.com:9000',
aws_access_key_id='minioadmin',
aws_secret_access_key='minioadmin',
region_name='us-east-1' # required by SDK even if meaningless for MinIO
)
Incompatibilities to watch for: Some advanced S3 features are not universally supported — Select (SQL on objects), Intelligent Tiering, Storage Classes beyond STANDARD/REDUCED_REDUNDANCY, Lambda event notifications. Always test feature support against your specific endpoint version.
3. MinIO Deployment Modes¶
Single-node single-drive (dev/test only, no redundancy):
# Docker
docker run -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
-v /data/minio:/data \
minio/minio server /data --console-address ":9001"
Single-node multi-drive (erasure coding across local drives):
# 4 drives on one node — erasure coded
minio server /data1 /data2 /data3 /data4 --console-address ":9001"
# MinIO automatically uses EC:2 (2 parity drives for 4-drive setup)
Distributed MinIO (production: multiple nodes, multiple drives):
# 4 nodes × 4 drives each = 16 drives total, EC:4
minio server \
http://minio{1...4}/data{1...4} \
--console-address ":9001"
# Minimum for distributed: 4 drives total (2 data + 2 parity)
# Recommended for production: at least 8 drives spread across 4+ nodes
Kubernetes (MinIO Operator):
# MinIO Tenant CR
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
name: myminio
namespace: minio
spec:
image: minio/minio:RELEASE.2024-01-01T00-00-00Z
pools:
- name: pool-0
servers: 4
volumesPerServer: 4
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: local-path
credsSecret:
name: minio-creds
4. Bucket Policies and IAM-Style Access¶
MinIO uses a subset of AWS IAM policy syntax for access control.
// Allow read-only access to a specific prefix
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/public/*"
]
}
]
}
# Apply a bucket policy with mc (MinIO Client)
mc alias set local http://localhost:9000 minioadmin minioadmin
mc anonymous set download local/my-bucket/public # public read for prefix
# Set a custom bucket policy from file
mc admin policy create local readonly-policy policy.json
mc admin policy attach local readonly-policy --user readonlyuser
# Create a user and attach a policy
mc admin user add local appuser secretpassword
mc admin policy attach local readwrite local/appuser
# Create an access key for a user
mc admin user svcacct add local appuser
# Returns: Access Key + Secret Key pair
Service Accounts (MinIO): scoped credentials that inherit parent user permissions but can be restricted further. Use for application credentials rather than sharing the root key.
Gotcha: Incomplete multipart uploads are invisible in normal
lscommands but consume real disk space. A failed 100 GB upload leaves 100 GB of orphaned parts on the server. Always set a lifecycle rule to abort incomplete multipart uploads after 7 days — this is the single most common source of unexplained storage growth in S3-compatible systems.
5. Lifecycle Rules¶
Lifecycle rules automate object expiration, transitions, and cleanup of incomplete multipart uploads.
// Expire objects in the logs/ prefix after 90 days
// Expire incomplete multipart uploads after 7 days
{
"Rules": [
{
"ID": "expire-logs",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Expiration": { "Days": 90 }
},
{
"ID": "abort-incomplete-multipart",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
}
]
}
# Apply lifecycle config with mc
mc ilm import local/my-bucket < lifecycle.json
mc ilm ls local/my-bucket
mc ilm export local/my-bucket
# With AWS CLI
aws --endpoint-url=http://minio:9000 s3api put-bucket-lifecycle-configuration \
--bucket my-bucket --lifecycle-configuration file://lifecycle.json
6. Versioning¶
Versioning keeps every version of an object. A DELETE creates a delete marker; the data still exists.
# Enable versioning
mc version enable local/my-bucket
mc version info local/my-bucket
# List all versions of all objects
mc ls --versions local/my-bucket
# Get a specific version
mc cp --version-id "xxxxx-yyyy" local/my-bucket/file.txt ./recovered.txt
# Remove a specific version permanently
mc rm --version-id "xxxxx-yyyy" local/my-bucket/file.txt
# AWS CLI equivalents
aws --endpoint-url=http://minio:9000 s3api put-bucket-versioning \
--bucket my-bucket --versioning-configuration Status=Enabled
aws --endpoint-url=http://minio:9000 s3api list-object-versions \
--bucket my-bucket --prefix logs/
7. Object Locking (WORM)¶
Object Lock prevents objects from being deleted or overwritten for a specified period. Used for compliance (WORM — Write Once Read Many).
# Create bucket with object locking enabled (must be done at creation time)
mc mb --with-lock local/compliance-bucket
# Set default retention
mc retention set --default COMPLIANCE "90d" local/compliance-bucket
# Set retention on a specific object
mc retention set GOVERNANCE "30d" local/compliance-bucket/report.pdf
# GOVERNANCE mode: can be bypassed by a user with s3:BypassGovernanceRetention permission
# COMPLIANCE mode: cannot be bypassed by anyone, including root
8. Multipart Upload¶
Objects larger than 5 GB must use multipart upload. AWS SDK and mc do this automatically, but you should understand it for debugging and cleanup.
import boto3
s3 = boto3.client('s3', endpoint_url='http://minio:9000',
aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin')
# Manual multipart upload
mpu = s3.create_multipart_upload(Bucket='my-bucket', Key='large-file.bin')
upload_id = mpu['UploadId']
parts = []
part_size = 50 * 1024 * 1024 # 50 MB parts (minimum 5 MB)
with open('large-file.bin', 'rb') as f:
part_num = 1
while chunk := f.read(part_size):
resp = s3.upload_part(
Bucket='my-bucket', Key='large-file.bin',
UploadId=upload_id, PartNumber=part_num, Body=chunk
)
parts.append({'PartNumber': part_num, 'ETag': resp['ETag']})
part_num += 1
s3.complete_multipart_upload(
Bucket='my-bucket', Key='large-file.bin',
UploadId=upload_id,
MultipartUpload={'Parts': parts}
)
# List incomplete multipart uploads (these consume space!)
mc ls --incomplete local/my-bucket
aws --endpoint-url=http://minio:9000 s3api list-multipart-uploads --bucket my-bucket
# Abort an incomplete multipart upload
aws --endpoint-url=http://minio:9000 s3api abort-multipart-upload \
--bucket my-bucket --key large-file.bin --upload-id <upload-id>
# Prevent accumulation: lifecycle rule to abort incomplete MPUs after 7 days (see above)
9. Presigned URLs¶
Generate a time-limited URL to give temporary access to a private object — no AWS credentials needed to use it.
import boto3
s3 = boto3.client('s3', endpoint_url='http://minio:9000',
aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin')
# Generate presigned GET URL (valid for 1 hour)
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-bucket', 'Key': 'reports/monthly.pdf'},
ExpiresIn=3600
)
# Share url with external user — no auth required
# Generate presigned PUT URL (allow uploads without credentials)
put_url = s3.generate_presigned_url(
'put_object',
Params={'Bucket': 'my-bucket', 'Key': 'uploads/user123/file.jpg',
'ContentType': 'image/jpeg'},
ExpiresIn=600 # 10 minutes
)
# mc presigned URL
mc share download local/my-bucket/reports/monthly.pdf --expire 2h
mc share upload local/my-bucket/uploads/ --expire 1h
10. MinIO Client (mc) Commands¶
# Alias setup
mc alias set local http://localhost:9000 minioadmin minioadmin
mc alias set prod https://minio.prod.example.com ACCESS_KEY SECRET_KEY
mc alias ls
# Bucket operations
mc mb local/new-bucket
mc rb local/empty-bucket
mc ls local/
mc du local/my-bucket # disk usage
# Object operations
mc cp file.txt local/my-bucket/path/file.txt
mc mv local/my-bucket/old.txt local/my-bucket/new.txt # actually copy+delete
mc rm local/my-bucket/file.txt
mc cat local/my-bucket/config.yaml # print object contents
mc head local/my-bucket/data.csv # first 10 lines
# Sync (rsync-like)
mc mirror /local/path local/my-bucket/backup/
mc mirror --remove local/my-bucket/src/ local/my-bucket/dst/ # delete extras
# Bulk operations
mc rm --recursive --force local/my-bucket/old-logs/
mc cp --recursive local/src-bucket/ local/dst-bucket/
# Server admin
mc admin info local
mc admin user ls local
mc admin policy ls local
mc admin service restart local
11. Ceph RGW as S3 Endpoint¶
Ceph's RADOS Gateway (RGW) is a fully S3-compatible API on top of RADOS.
# Deploy via cephadm
ceph orch apply rgw myzone '--placement=label:rgw count-per-host:2'
# Create a user
radosgw-admin user create \
--uid=s3user \
--display-name="Application S3 User" \
--access-key=AKIAIOSFODNN7EXAMPLE \
--secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Set quotas
radosgw-admin quota set --quota-scope=user --uid=s3user \
--max-size=100G --max-objects=1000000
radosgw-admin quota enable --quota-scope=user --uid=s3user
# Use with AWS CLI
aws --endpoint-url=http://rgw-host:7480 s3 mb s3://my-bucket
aws --endpoint-url=http://rgw-host:7480 s3 cp file.txt s3://my-bucket/
aws --endpoint-url=http://rgw-host:7480 s3 ls s3://my-bucket/
# RGW performance tuning
# Multiple RGW instances behind a load balancer for throughput
# Beast frontend (default in Reef) outperforms Civetweb
12. Performance Characteristics vs Block Storage¶
| Property | Object Storage | Block Storage |
|---|---|---|
| Access pattern | Whole-object GET/PUT | Random read/write |
| Latency | 10-100ms per operation | <1ms (NVMe), ~5ms (SSD) |
| Throughput | High for large objects | High for small random I/O |
| Metadata operations | Slow (ListObjects is expensive) | Fast (filesystem) |
| Consistency | Read-after-write (AWS S3 since Dec 2020); varies by impl | Strong |
| Rename | Not atomic — must copy+delete | Atomic |
| Max object size | 5 TB (S3) | Filesystem-dependent |
| Best use case | Large files, infrequent access, high volume | Databases, OS, high-IOPS workloads |
13. Use Cases in Modern Infrastructure¶
Artifact storage: Build outputs, Docker image layers (Harbor), Helm chart repos, APT/RPM repos.
Backup target: Velero (Kubernetes backup), Restic, Barman (PostgreSQL), mysqldump destinations.
Loki chunk storage:
# Loki config
schema_config:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h
storage_config:
aws:
s3: http://minioadmin:minioadmin@minio:9000/loki
s3forcepathstyle: true # required for MinIO
Thanos object storage:
# thanos-storage.yaml (objstore.yml)
type: S3
config:
bucket: thanos
endpoint: minio:9000
access_key: minioadmin
secret_key: minioadmin
insecure: true
Tempo (tracing):
storage:
trace:
backend: s3
s3:
bucket: tempo
endpoint: minio:9000
access_key: minioadmin
secret_key: minioadmin
insecure: true
Quick Reference¶
# mc quick reference
mc alias set local http://localhost:9000 minioadmin minioadmin
mc ls local/ # list buckets
mc ls local/my-bucket/ # list objects
mc cp file.txt local/my-bucket/ # upload
mc cp local/my-bucket/file.txt . # download
mc rm local/my-bucket/file.txt # delete
mc mirror ./dir/ local/my-bucket/ # sync directory
# AWS CLI quick reference
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
aws s3 ls
aws s3 cp file.txt s3://my-bucket/
aws s3 sync ./dir s3://my-bucket/backup/
aws s3 rm s3://my-bucket/old.txt
aws s3api get-object-attributes --bucket my-bucket --key file.txt \
--object-attributes ETag Checksum ObjectSize
Wiki Navigation¶
Prerequisites¶
- Storage Operations (Topic Pack, L2)