π§© Guide: Database Sharding β A Systems Design Deep Dive
Database sharding is a horizontal partitioning technique where large databases are split into smaller, faster, more manageable "shards" that are spread across multiple servers or nodes.
This is a foundational strategy for scaling read/write-heavy systems, used by platforms like Twitter, Amazon, and Facebook.
π§ 1. What Is Sharding?
Sharding = splitting a large dataset across multiple databases or storage systems, each handling only a subset of the total data.
Each shard is a self-contained unit responsible for a portion of the data.
Sharding vs Partitioning
Aspect | Sharding | Partitioning (in one DB) |
---|---|---|
Level | Across nodes | Inside a single DB server |
Scope | Horizontal scale-out | Logical organization |
Used for | Scaling and distribution | Query performance, archival |
βοΈ 2. Why Shard?
- Horizontal scalability: Handle large volumes of data and traffic
- Parallelism: Each shard handles its own read/write load
- Fault isolation: Failure in one shard β total failure
- Data locality: Can co-locate data near users (geo-sharding)
π§ 3. Sharding Strategies
1οΈβ£ Hash-Based Sharding
Use a hash of a key (e.g. user ID) to assign to a shard.
shard_id = hash(user_id) % num_shards
β
Simple and uniform
β Hard to rebalance when adding/removing shards
2οΈβ£ Range-Based Sharding
Assign data based on a value range.
Shard | User ID Range |
---|---|
1 | 1β1M |
2 | 1Mβ2M |
3 | 2Mβ3M |
β
Good for queries by range
β Risk of hotspots (e.g., last shard gets all new users)
3οΈβ£ Directory-Based (Lookup Table)
Use a central directory or metadata service to track where each record lives.
getShard(user_id) β lookup in user_shard_map
β
Dynamic and flexible
β Extra network hop; central point of failure unless replicated
4οΈβ£ Geo-Sharding
Shard based on location or geography (e.g. US, EU, Asia).
β
Complies with regulations (e.g. GDPR), improves latency
β Hard to rebalance globally, cross-region queries are expensive
πΊοΈ 4. Data Routing
How does your application know which shard to query?
- Client-based routing: Client calculates shard (
hash(id) % n
) - Proxy-based routing: Intermediate layer routes based on metadata
- Middleware/router (e.g. Vitess, Citus) intercepts and distributes queries
π§Ή 5. Rebalancing & Resharding
When you add or remove shards, you may need to reshuffle data:
- With hash sharding: Change in
num_shards
means all keys must be remapped - Solution: Use consistent hashing
ring = HashRing(shards)
shard = ring.get_node(user_id)
- Online Resharding Tools: Vitess, Facebook's Hydra, Instagramβs Shard Manager
ποΈ 6. Sharding Architectures
Shared Nothing
Each shard has its own DB server, schema, and hardware.
β
Fully independent
β More operational complexity
Shared Schema
All shards use the same schema; each handles only part of the data.
β
Easy to replicate/scale
β Requires careful query scoping
π§ͺ 7. Querying Sharded Databases
Query Type | Strategy |
---|---|
Point lookup | Route directly to correct shard |
Range query | Might hit multiple shards |
Cross-shard join | Avoid! Or pre-join in application |
Aggregation | Perform in parallel, merge in app layer |
// Application-side fan-out
for (shard in shards) {
results += query(shard, "SELECT COUNT(*) FROM orders")
}
return sum(results)
π 8. Consistency, Failover & Replication
- Each shard should have replication (primary + replicas)
- May use eventual consistency for performance
- Failover and monitoring per shard
Tools like Vitess, Citus, and CockroachDB handle this automatically.
β οΈ 9. Sharding Gotchas
Problem | Notes |
---|---|
Cross-shard joins | Expensive, often need application logic |
Rebalancing difficulty | Hard to add shards without downtime |
Operational complexity | Monitoring, backup, failover per shard |
Hotspots | Uneven traffic on certain shards |
Global uniqueness | Must generate unique IDs across shards |
π Solutions
- Use UUIDs or Snowflake IDs to ensure global uniqueness
- Implement query routers or gateways
- Use application-level join if needed
π§° 10. Tools & Technologies
Tool | Description |
---|---|
Vitess | MySQL sharding middleware, used by YouTube |
Citus | PostgreSQL extension for sharding |
CockroachDB | Globally distributed, auto-sharding |
MongoDB | Native sharding via config servers |
Cassandra | Peer-to-peer, partitioned by key |
π§ 11. When to Use Sharding
β Use sharding when:
- Your DB size or write traffic exceeds what a single machine can handle
- Youβve already optimized indexing, caching, and vertical scaling
- You need high availability across regions or fault domains
π« Avoid premature sharding β it's complex and operationally expensive
π Further Reading
- Sharding at Instagram (Engineering Blog)
- Vitess Overview
- Citus Sharding for Postgres
- High Scalability: YouTube's Architecture
- Google Spanner vs Sharded DBs
<< back to Guides