Sitemap

How S3 works under the hood..

2 min read1 day ago
Photo by on

In a simple paragraph we can say

“File info and the actual data is kept separate. The metadata goes into a big database, and the files themselves are broken into chunks and stored on huge storage arrays. The database also keeps track of where each chunk is and includes hashes to make sure nothing’s been tampered with.

Every file is stored in three different data centres at the same time, just to be safe.”

Key Components of S3:

Metadata Service

  • Stores bucket and object metadata.
  • Uses composite sharding key (hash of object name + bucket name) for scale and efficient lookup.

Data Service

  • Manages storage nodes.
  • Uses Write-Ahead Log (WAL) for compact storage.
  • Stores object location in an embedded DB (e.g., SQLite).
  • Handles replication, heartbeats, and reads/writes.

S3 is designed around the concept of object storage, where data is stored as objects within buckets.

What are Buckets:

An object is an immutable piece of…

  • data, a sequence of bytes, e.g. the blob of bytes of binary data that makes up an image, and
  • its separate but associated metadata, key-value pairs describing the actual data, e.g. the name of the image.

By immutable we mean that we can delete or overwrite but not patch.

Durability, Replication, & Integration
To achieve its famed 11-nine (99.999999999%) durability, S3 automatically replicates each object across multiple Availability Zones within a region. If one data center goes offline, your data remains instantly available elsewhere. S3 also plugs seamlessly into other AWS services — Lambda can run code when objects arrive, CloudFront can cache and deliver them globally, and Glacier provides low-cost, long-term archival tiers.

Internal Architecture: API, Metadata & Data Services
When you request a bucket or object action, S3’s API layer first authenticates and authorizes via IAM. Behind that sits the Metadata Service, which maintains small, highly-optimized tables of bucket and object records (sharded by a hash of bucket + object name for scale). The Data Service handles the heavy lifting: it groups disk drives into storage nodes and writes objects into append-only write-ahead logs, recording each object’s file and offset in an embedded index (e.g., SQLite) so it can be fetched quickly.

Amlan Bose
Amlan Bose

Written by Amlan Bose

Data Engineer with a knack for Doodling

No responses yet