Skip to content

Portal | Level: L1: Foundations | Topics: S3-Compatible Object Storage | Domain: DevOps & Tooling

S3-Compatible Object Storage — Primer

Why This Matters

Object storage (S3, MinIO, Ceph RGW) is the standard backend for artifact storage, backups, log archiving, data lake landing zones, and modern infrastructure components like Loki, Thanos, Velero, and Harbor. Understanding the S3 API, how to operate MinIO, how to configure access policies, and how to tune for performance lets you confidently run object storage anywhere — on-prem, in Kubernetes, or in the cloud — and integrate it with any S3-aware tool.

Core Concepts

1. Object Storage Model

Name origin: S3 stands for "Simple Storage Service" — launched by AWS in March 2006, it was one of the first commercially available cloud services. Its HTTP-based REST API became the de facto standard, which is why MinIO, Ceph RGW, and dozens of other implementations speak "S3-compatible."

Object storage has no filesystem hierarchy. There are no directories — only buckets and objects. "Folders" in S3-compatible UIs are just a naming convention where key prefixes contain /.

Bucket: my-artifacts
Objects:
  builds/2024/01/15/app-v1.2.3.tar.gz   ← full object key
  builds/2024/01/15/checksums.sha256
  backups/daily/2024-01-15.tar.gz
  logs/app/2024-01-15/access.log.gz

Key concepts: - Bucket: flat namespace container. Globally unique name within the endpoint. - Object: a blob of bytes (0 bytes to 5 TB for multipart) plus metadata. - Key: the full path string that identifies an object within a bucket. / has no special filesystem meaning — it's just a character. - Metadata: HTTP headers attached to objects (Content-Type, custom x-amz-meta- headers, ETag, Last-Modified). - ETag*: MD5 of the object (single-part) or MD5 of concatenated part ETags (multipart). Use for integrity verification.

No directories means: you can't atomically rename a "directory" — you must copy all objects with the new prefix then delete the old ones. This is an important operational constraint.

2. S3 API Compatibility

The S3 API is the de facto standard for object storage. Major operations:

Bucket operations:
  PUT /bucket             CreateBucket
  DELETE /bucket          DeleteBucket
  GET /?list-type=2       ListBuckets
  GET /bucket?list-type=2  ListObjectsV2

Object operations:
  PUT /bucket/key         PutObject
  GET /bucket/key         GetObject
  HEAD /bucket/key        HeadObject (metadata only)
  DELETE /bucket/key      DeleteObject
  COPY source to dest     CopyObject

Multipart:
  POST /bucket/key?uploads                     CreateMultipartUpload
  PUT /bucket/key?partNumber=N&uploadId=X      UploadPart
  POST /bucket/key?uploadId=X                  CompleteMultipartUpload
  DELETE /bucket/key?uploadId=X                AbortMultipartUpload

Pre-signed URLs:
  Temporary signed GET/PUT URLs with expiry

What matters for portability: Any tool using the AWS SDK or boto3 works with MinIO, Ceph RGW, or other S3-compatible endpoints by setting:

import boto3
s3 = boto3.client('s3',
    endpoint_url='http://minio.example.com:9000',
    aws_access_key_id='minioadmin',
    aws_secret_access_key='minioadmin',
    region_name='us-east-1'   # required by SDK even if meaningless for MinIO
)

Incompatibilities to watch for: Some advanced S3 features are not universally supported — Select (SQL on objects), Intelligent Tiering, Storage Classes beyond STANDARD/REDUCED_REDUNDANCY, Lambda event notifications. Always test feature support against your specific endpoint version.

3. MinIO Deployment Modes

Single-node single-drive (dev/test only, no redundancy):

# Docker
docker run -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  -v /data/minio:/data \
  minio/minio server /data --console-address ":9001"

Single-node multi-drive (erasure coding across local drives):

# 4 drives on one node — erasure coded
minio server /data1 /data2 /data3 /data4 --console-address ":9001"
# MinIO automatically uses EC:2 (2 parity drives for 4-drive setup)

Distributed MinIO (production: multiple nodes, multiple drives):

# 4 nodes × 4 drives each = 16 drives total, EC:4
minio server \
  http://minio{1...4}/data{1...4} \
  --console-address ":9001"

# Minimum for distributed: 4 drives total (2 data + 2 parity)
# Recommended for production: at least 8 drives spread across 4+ nodes

Kubernetes (MinIO Operator):

# MinIO Tenant CR
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: myminio
  namespace: minio
spec:
  image: minio/minio:RELEASE.2024-01-01T00-00-00Z
  pools:
  - name: pool-0
    servers: 4
    volumesPerServer: 4
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: local-path
  credsSecret:
    name: minio-creds

4. Bucket Policies and IAM-Style Access

MinIO uses a subset of AWS IAM policy syntax for access control.

// Allow read-only access to a specific prefix
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/public/*"
      ]
    }
  ]
}
# Apply a bucket policy with mc (MinIO Client)
mc alias set local http://localhost:9000 minioadmin minioadmin
mc anonymous set download local/my-bucket/public   # public read for prefix

# Set a custom bucket policy from file
mc admin policy create local readonly-policy policy.json
mc admin policy attach local readonly-policy --user readonlyuser

# Create a user and attach a policy
mc admin user add local appuser secretpassword
mc admin policy attach local readwrite local/appuser

# Create an access key for a user
mc admin user svcacct add local appuser
# Returns: Access Key + Secret Key pair

Service Accounts (MinIO): scoped credentials that inherit parent user permissions but can be restricted further. Use for application credentials rather than sharing the root key.

Gotcha: Incomplete multipart uploads are invisible in normal ls commands but consume real disk space. A failed 100 GB upload leaves 100 GB of orphaned parts on the server. Always set a lifecycle rule to abort incomplete multipart uploads after 7 days — this is the single most common source of unexplained storage growth in S3-compatible systems.

5. Lifecycle Rules

Lifecycle rules automate object expiration, transitions, and cleanup of incomplete multipart uploads.

// Expire objects in the logs/ prefix after 90 days
// Expire incomplete multipart uploads after 7 days
{
  "Rules": [
    {
      "ID": "expire-logs",
      "Status": "Enabled",
      "Filter": { "Prefix": "logs/" },
      "Expiration": { "Days": 90 }
    },
    {
      "ID": "abort-incomplete-multipart",
      "Status": "Enabled",
      "Filter": { "Prefix": "" },
      "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
    }
  ]
}
# Apply lifecycle config with mc
mc ilm import local/my-bucket < lifecycle.json
mc ilm ls local/my-bucket
mc ilm export local/my-bucket

# With AWS CLI
aws --endpoint-url=http://minio:9000 s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket --lifecycle-configuration file://lifecycle.json

6. Versioning

Versioning keeps every version of an object. A DELETE creates a delete marker; the data still exists.

# Enable versioning
mc version enable local/my-bucket
mc version info local/my-bucket

# List all versions of all objects
mc ls --versions local/my-bucket

# Get a specific version
mc cp --version-id "xxxxx-yyyy" local/my-bucket/file.txt ./recovered.txt

# Remove a specific version permanently
mc rm --version-id "xxxxx-yyyy" local/my-bucket/file.txt

# AWS CLI equivalents
aws --endpoint-url=http://minio:9000 s3api put-bucket-versioning \
  --bucket my-bucket --versioning-configuration Status=Enabled

aws --endpoint-url=http://minio:9000 s3api list-object-versions \
  --bucket my-bucket --prefix logs/

7. Object Locking (WORM)

Object Lock prevents objects from being deleted or overwritten for a specified period. Used for compliance (WORM — Write Once Read Many).

# Create bucket with object locking enabled (must be done at creation time)
mc mb --with-lock local/compliance-bucket

# Set default retention
mc retention set --default COMPLIANCE "90d" local/compliance-bucket

# Set retention on a specific object
mc retention set GOVERNANCE "30d" local/compliance-bucket/report.pdf

# GOVERNANCE mode: can be bypassed by a user with s3:BypassGovernanceRetention permission
# COMPLIANCE mode: cannot be bypassed by anyone, including root

8. Multipart Upload

Objects larger than 5 GB must use multipart upload. AWS SDK and mc do this automatically, but you should understand it for debugging and cleanup.

import boto3
s3 = boto3.client('s3', endpoint_url='http://minio:9000',
    aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin')

# Manual multipart upload
mpu = s3.create_multipart_upload(Bucket='my-bucket', Key='large-file.bin')
upload_id = mpu['UploadId']

parts = []
part_size = 50 * 1024 * 1024  # 50 MB parts (minimum 5 MB)
with open('large-file.bin', 'rb') as f:
    part_num = 1
    while chunk := f.read(part_size):
        resp = s3.upload_part(
            Bucket='my-bucket', Key='large-file.bin',
            UploadId=upload_id, PartNumber=part_num, Body=chunk
        )
        parts.append({'PartNumber': part_num, 'ETag': resp['ETag']})
        part_num += 1

s3.complete_multipart_upload(
    Bucket='my-bucket', Key='large-file.bin',
    UploadId=upload_id,
    MultipartUpload={'Parts': parts}
)
# List incomplete multipart uploads (these consume space!)
mc ls --incomplete local/my-bucket
aws --endpoint-url=http://minio:9000 s3api list-multipart-uploads --bucket my-bucket

# Abort an incomplete multipart upload
aws --endpoint-url=http://minio:9000 s3api abort-multipart-upload \
  --bucket my-bucket --key large-file.bin --upload-id <upload-id>

# Prevent accumulation: lifecycle rule to abort incomplete MPUs after 7 days (see above)

9. Presigned URLs

Generate a time-limited URL to give temporary access to a private object — no AWS credentials needed to use it.

import boto3
s3 = boto3.client('s3', endpoint_url='http://minio:9000',
    aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin')

# Generate presigned GET URL (valid for 1 hour)
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-bucket', 'Key': 'reports/monthly.pdf'},
    ExpiresIn=3600
)
# Share url with external user — no auth required

# Generate presigned PUT URL (allow uploads without credentials)
put_url = s3.generate_presigned_url(
    'put_object',
    Params={'Bucket': 'my-bucket', 'Key': 'uploads/user123/file.jpg',
            'ContentType': 'image/jpeg'},
    ExpiresIn=600   # 10 minutes
)
# mc presigned URL
mc share download local/my-bucket/reports/monthly.pdf --expire 2h
mc share upload local/my-bucket/uploads/ --expire 1h

10. MinIO Client (mc) Commands

# Alias setup
mc alias set local http://localhost:9000 minioadmin minioadmin
mc alias set prod https://minio.prod.example.com ACCESS_KEY SECRET_KEY
mc alias ls

# Bucket operations
mc mb local/new-bucket
mc rb local/empty-bucket
mc ls local/
mc du local/my-bucket   # disk usage

# Object operations
mc cp file.txt local/my-bucket/path/file.txt
mc mv local/my-bucket/old.txt local/my-bucket/new.txt   # actually copy+delete
mc rm local/my-bucket/file.txt
mc cat local/my-bucket/config.yaml       # print object contents
mc head local/my-bucket/data.csv         # first 10 lines

# Sync (rsync-like)
mc mirror /local/path local/my-bucket/backup/
mc mirror --remove local/my-bucket/src/ local/my-bucket/dst/  # delete extras

# Bulk operations
mc rm --recursive --force local/my-bucket/old-logs/
mc cp --recursive local/src-bucket/ local/dst-bucket/

# Server admin
mc admin info local
mc admin user ls local
mc admin policy ls local
mc admin service restart local

11. Ceph RGW as S3 Endpoint

Ceph's RADOS Gateway (RGW) is a fully S3-compatible API on top of RADOS.

# Deploy via cephadm
ceph orch apply rgw myzone '--placement=label:rgw count-per-host:2'

# Create a user
radosgw-admin user create \
  --uid=s3user \
  --display-name="Application S3 User" \
  --access-key=AKIAIOSFODNN7EXAMPLE \
  --secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Set quotas
radosgw-admin quota set --quota-scope=user --uid=s3user \
  --max-size=100G --max-objects=1000000
radosgw-admin quota enable --quota-scope=user --uid=s3user

# Use with AWS CLI
aws --endpoint-url=http://rgw-host:7480 s3 mb s3://my-bucket
aws --endpoint-url=http://rgw-host:7480 s3 cp file.txt s3://my-bucket/
aws --endpoint-url=http://rgw-host:7480 s3 ls s3://my-bucket/

# RGW performance tuning
# Multiple RGW instances behind a load balancer for throughput
# Beast frontend (default in Reef) outperforms Civetweb

12. Performance Characteristics vs Block Storage

Property Object Storage Block Storage
Access pattern Whole-object GET/PUT Random read/write
Latency 10-100ms per operation <1ms (NVMe), ~5ms (SSD)
Throughput High for large objects High for small random I/O
Metadata operations Slow (ListObjects is expensive) Fast (filesystem)
Consistency Read-after-write (AWS S3 since Dec 2020); varies by impl Strong
Rename Not atomic — must copy+delete Atomic
Max object size 5 TB (S3) Filesystem-dependent
Best use case Large files, infrequent access, high volume Databases, OS, high-IOPS workloads

13. Use Cases in Modern Infrastructure

Artifact storage: Build outputs, Docker image layers (Harbor), Helm chart repos, APT/RPM repos.

Backup target: Velero (Kubernetes backup), Restic, Barman (PostgreSQL), mysqldump destinations.

Loki chunk storage:

# Loki config
schema_config:
  configs:
  - from: 2024-01-01
    store: boltdb-shipper
    object_store: s3
    schema: v11
    index:
      prefix: index_
      period: 24h

storage_config:
  aws:
    s3: http://minioadmin:minioadmin@minio:9000/loki
    s3forcepathstyle: true  # required for MinIO

Thanos object storage:

# thanos-storage.yaml (objstore.yml)
type: S3
config:
  bucket: thanos
  endpoint: minio:9000
  access_key: minioadmin
  secret_key: minioadmin
  insecure: true

Tempo (tracing):

storage:
  trace:
    backend: s3
    s3:
      bucket: tempo
      endpoint: minio:9000
      access_key: minioadmin
      secret_key: minioadmin
      insecure: true

Quick Reference

# mc quick reference
mc alias set local http://localhost:9000 minioadmin minioadmin
mc ls local/                          # list buckets
mc ls local/my-bucket/               # list objects
mc cp file.txt local/my-bucket/      # upload
mc cp local/my-bucket/file.txt .     # download
mc rm local/my-bucket/file.txt       # delete
mc mirror ./dir/ local/my-bucket/    # sync directory

# AWS CLI quick reference
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
aws s3 ls
aws s3 cp file.txt s3://my-bucket/
aws s3 sync ./dir s3://my-bucket/backup/
aws s3 rm s3://my-bucket/old.txt
aws s3api get-object-attributes --bucket my-bucket --key file.txt \
  --object-attributes ETag Checksum ObjectSize

Wiki Navigation

Prerequisites