<< back to Guides

πŸ”„ Guide: Consistent Hashing β€” A Systems Design Deep Dive

Consistent hashing is a key technique in system design for evenly distributing load across dynamic server nodes without needing to reassign all keys when the cluster topology changes.

It is widely used in:


🎯 1. The Problem with Naive Hashing

Let’s say you distribute keys using:

node = hash(key) % N

❌ When N (the number of servers) changes, almost all keys are remapped, which causes:


🧠 2. What Is Consistent Hashing?

Consistent hashing solves this by mapping both nodes and keys onto the same circular hash space (a ring).

// Example: simplified concept
nodes = ["A", "B", "C"]
hash_ring = sorted([hash(n) for n in nodes])
key_hash = hash("user42")
target_node = find_first_node_clockwise(hash_ring, key_hash)

βœ… When a node is added/removed, only a small subset of keys move
βœ… Maintains balanced distribution and scales smoothly


πŸŒ€ 3. The Hash Ring

Visualize the hash space as a circle:

   0 ----------------> 2^32-1
   ^                  |
   |     Node B       |
   |     ↑            ↓
   |  Node A      Node C
   --------------------

🧩 4. Virtual Nodes (VNodes)

Virtual nodes improve balance and fault tolerance:

# Node A gets 100 virtual nodes
for i in range(100):
    ring.add(hash("A#" + str(i)))

βœ… Better load distribution
βœ… Easier rebalance during scale-up/down
βœ… Faster recovery after failure


πŸ”„ 5. When Nodes Join or Leave

βœ… Node Addition

❌ Node Removal

This locality and stability make consistent hashing ideal for dynamic systems.


πŸ“¦ 6. Use Cases

Use Case Why Consistent Hashing Helps
Distributed Caching Prevents cache misses on node changes
Database Sharding Smooth resharding with minimal movement
Load Balancing Assigns client requests to backend servers
Partitioned Logs Distributes partitions across brokers
Distributed File Stores Maps files to storage servers efficiently

βš™οΈ 7. Real-World Systems That Use It

System Use of Consistent Hashing
Amazon Dynamo Ring-based partitioning, VNodes
Cassandra VNodes + token ring for partitioning
Riak Core to key distribution and fault tolerance
Memcached Client-side hashing for cache routing
Envoy Proxy Uses it in load balancing algorithms
Kafka Custom partitioners based on hash ring

⚠️ 8. Pitfalls and Considerations

Issue Solution
Hash imbalance Use VNodes to smooth the ring
Hot keys (skew) Combine hashing + replication
Hash collisions Use strong, uniform hash function
Rebalancing complexity Use libraries/tools (e.g. Ketama)

πŸ› οΈ 9. Sample Implementation (Python-like Pseudocode)

class ConsistentHashRing:
    def __init__(self, nodes, vnodes=100):
        self.ring = {}
        self.sorted_keys = []
        for node in nodes:
            for i in range(vnodes):
                key = hash(f"{node}#{i}")
                self.ring[key] = node
                self.sorted_keys.append(key)
        self.sorted_keys.sort()

    def get_node(self, item_key):
        key = hash(item_key)
        for k in self.sorted_keys:
            if key <= k:
                return self.ring[k]
        return self.ring[self.sorted_keys[0]]  # Wrap around

βœ… Summary

Concept Description
Problem % N hashing remaps all keys on change
Solution Consistent hashing maps nodes + keys to ring
Benefit Only small subset of keys need to move
Advanced Feature Virtual nodes improve balance
Real-World Use Dynamo, Cassandra, Kafka, Envoy, Memcached

πŸ“š Further Reading


<< back to Guides