<< back to Guides

🚦 Guide: Rate Limiting β€” A Systems Design Deep Dive

Rate limiting is a mechanism to control the rate of requests clients can make to a system. It’s essential in distributed systems for fairness, protection, and resource management.


🧠 1. What Is Rate Limiting?

Rate limiting restricts how many requests a client (IP, user, API key, etc.) can make in a given time window.

Why?


πŸ“ 2. Key Dimensions

Dimension Examples
Identity IP address, user ID, access token
Scope Per-endpoint, per-service, per-tenant
Granularity Per second, minute, hour, day
Limit Type Fixed, dynamic, bursty

βš™οΈ 3. Rate Limiting Algorithms

⏱️ 1. Fixed Window Counter

# Key: user:123|minute:202506192103
incr_counter(key)
if counter > limit: block

βœ… Simple
❌ Bursts near window boundaries (race condition)


βŒ› 2. Sliding Window Log

timestamps = get_requests(user)
timestamps = [t for t in timestamps if t > now - 60]
if len(timestamps) >= limit: block

βœ… Precise
❌ Memory grows with traffic


πŸ“‰ 3. Sliding Window Counter (Leaky Approximation)

βœ… Balanced burst tolerance and performance
❌ Slightly complex to implement


πŸ”„ 4. Token Bucket

bucket = get_bucket(user)
if bucket.tokens >= 1:
  bucket.tokens -= 1
  allow
else:
  reject

βœ… Allows short bursts
βœ… Ideal for APIs and microservices


πŸͺ£ 5. Leaky Bucket

βœ… Controls average rate
❌ Harder to burst


🧰 4. Where to Implement Rate Limiting?

Layer Use Case Tools / Examples
Client SDK Preemptive throttling Retry logic, backoff
API Gateway Global protection Kong, Envoy, NGINX, AWS API Gateway
Backend Per-service business limits Redis, Memcached
Middleware Shared logic across endpoints Express/Flask middleware

πŸ“Š 5. Rate Limiting in Redis

Redis is commonly used due to its atomic operations and expiry features:

INCR user:123
EXPIRE user:123 60
IF GET user:123 > 100 β†’ block

Atomic Lua scripts or SETNX can improve precision.


⚠️ 6. Distributed Rate Limiting

Challenges:

Solutions:


πŸ”„ 7. Retry-After and Headers

Always provide rate-limit headers:

Header Description
X-RateLimit-Limit Total allowed requests
X-RateLimit-Remaining Remaining requests in window
Retry-After Time to wait before retrying

This helps clients throttle themselves gracefully.


🧠 8. Best Practices


🧱 9. Real-World Examples

Service Strategy
GitHub API 5,000 requests per hour per authenticated user
Twitter API Varies by endpoint and access tier
Cloudflare Custom global rate limits at the edge
Stripe Burst + steady limits, HTTP 429 with Retry-After
AWS API Gateway Token bucket model per stage + method

βœ… Summary

Aspect Key Points
Purpose Protect systems from abuse and overload
Algorithms Fixed window, sliding window, token/leaky bucket
Deployment Gateway, backend, middleware
Tools Redis, Envoy, Kong, NGINX, AWS, Cloudflare

πŸ“š Further Reading


<< back to Guides