π¦ Guide: Rate Limiting β A Systems Design Deep Dive
Rate limiting is a mechanism to control the rate of requests clients can make to a system. Itβs essential in distributed systems for fairness, protection, and resource management.
π§ 1. What Is Rate Limiting?
Rate limiting restricts how many requests a client (IP, user, API key, etc.) can make in a given time window.
Why?
- Prevent abuse and DoS attacks
- Protect backend resources
- Ensure fair usage (multi-tenant systems)
- Enforce SLAs or pricing tiers
- Throttle traffic during peak load
π 2. Key Dimensions
Dimension | Examples |
---|---|
Identity | IP address, user ID, access token |
Scope | Per-endpoint, per-service, per-tenant |
Granularity | Per second, minute, hour, day |
Limit Type | Fixed, dynamic, bursty |
βοΈ 3. Rate Limiting Algorithms
β±οΈ 1. Fixed Window Counter
- Track count of requests in a fixed time window
# Key: user:123|minute:202506192103
incr_counter(key)
if counter > limit: block
β
Simple
β Bursts near window boundaries (race condition)
β 2. Sliding Window Log
- Store timestamps of each request and expire old ones
timestamps = get_requests(user)
timestamps = [t for t in timestamps if t > now - 60]
if len(timestamps) >= limit: block
β
Precise
β Memory grows with traffic
π 3. Sliding Window Counter (Leaky Approximation)
- Smoothen the fixed window using interpolation
β
Balanced burst tolerance and performance
β Slightly complex to implement
π 4. Token Bucket
- Tokens are added at a fixed rate; requests consume tokens
bucket = get_bucket(user)
if bucket.tokens >= 1:
bucket.tokens -= 1
allow
else:
reject
β
Allows short bursts
β
Ideal for APIs and microservices
πͺ£ 5. Leaky Bucket
- Queue-like system where requests drip out at constant rate
β
Controls average rate
β Harder to burst
π§° 4. Where to Implement Rate Limiting?
Layer | Use Case | Tools / Examples |
---|---|---|
Client SDK | Preemptive throttling | Retry logic, backoff |
API Gateway | Global protection | Kong, Envoy, NGINX, AWS API Gateway |
Backend | Per-service business limits | Redis, Memcached |
Middleware | Shared logic across endpoints | Express/Flask middleware |
π 5. Rate Limiting in Redis
Redis is commonly used due to its atomic operations and expiry features:
INCR user:123
EXPIRE user:123 60
IF GET user:123 > 100 β block
Atomic Lua scripts or SETNX
can improve precision.
β οΈ 6. Distributed Rate Limiting
Challenges:
- Shared state across nodes
- Atomic counters under high load
- Replication lag and clock drift
Solutions:
- Use Redis with TTL and Lua
- Use distributed coordination (etcd, Consul)
- External rate limiting services (Cloudflare, AWS)
π 7. Retry-After and Headers
Always provide rate-limit headers:
Header | Description |
---|---|
X-RateLimit-Limit |
Total allowed requests |
X-RateLimit-Remaining |
Remaining requests in window |
Retry-After |
Time to wait before retrying |
This helps clients throttle themselves gracefully.
π§ 8. Best Practices
- Separate limits for read vs write
- Apply stricter limits to anonymous users
- Use exponential backoff on retry
- Log rate-limited requests for observability
- Allow whitelisting for trusted services
π§± 9. Real-World Examples
Service | Strategy |
---|---|
GitHub API | 5,000 requests per hour per authenticated user |
Twitter API | Varies by endpoint and access tier |
Cloudflare | Custom global rate limits at the edge |
Stripe | Burst + steady limits, HTTP 429 with Retry-After |
AWS API Gateway | Token bucket model per stage + method |
β Summary
Aspect | Key Points |
---|---|
Purpose | Protect systems from abuse and overload |
Algorithms | Fixed window, sliding window, token/leaky bucket |
Deployment | Gateway, backend, middleware |
Tools | Redis, Envoy, Kong, NGINX, AWS, Cloudflare |
π Further Reading
<< back to Guides