<< back to Guides
<< back to Guides
βοΈ Guide: Load Balancing β A Systems Design Deep Dive
Load balancing is a critical pattern in distributed systems, ensuring high availability, fault tolerance, and scalable performance by distributing traffic across multiple backend components.
This guide explores:
- What load balancing is and why it matters
- Layer 4 vs Layer 7 strategies
- Algorithms, health checks, session handling
- Real-world implementations and pitfalls
π¦ 1. What is Load Balancing?
Load balancing is the process of distributing incoming network traffic across multiple backend servers (or nodes) to ensure:
- No single server is overwhelmed
- High availability during failure
- Efficient utilization of resources
- Lower latency and higher throughput
Basic Architecture
Client ββββββ
ββββΆ Load Balancer βββΆ Server 1
ββββΆ βββΆ Server 2
ββββΆ βββΆ Server 3
βοΈ 2. Types of Load Balancing
π§± Layer 4: Transport-level (TCP/UDP)
- Works at network layer
- Balances IP packets
- Fast, simple, protocol-agnostic
- Doesnβt understand HTTP, gRPC, etc.
Example tools: HAProxy (TCP mode), AWS NLB
frontend tcp-in
bind *:443
default_backend web-servers
backend web-servers
balance roundrobin
server s1 10.0.0.1:443 check
server s2 10.0.0.2:443 check
π Layer 7: Application-level (HTTP/HTTPS)
- Understands protocols like HTTP, gRPC
- Can inspect headers, URLs, cookies
- Allows advanced routing, auth, A/B testing
Example tools: NGINX, Envoy, AWS ALB, Traefik
location /api/v1/ {
proxy_pass http://backend-api-v1;
}
π 3. Load Balancing Algorithms
Algorithm | Description | Use Case |
---|---|---|
Round Robin | Evenly distribute requests in order | Stateless services |
Least Connections | Send to server with fewest active requests | Long-lived or variable load |
IP Hash | Hash IP to route consistently | Session affinity |
Random | Select randomly from pool | Simple but not optimal |
Weighted | Favor stronger machines | Mixed-capacity nodes |
β€οΈ 4. Session Persistence ("Sticky Sessions")
Some apps (like login sessions) need a user to always talk to the same backend server.
Options:
- IP Hashing: Simple but not accurate under NAT
- Cookie-Based: LB injects cookie to maintain affinity
- Session Stores: Use Redis or DB for external state
// In NGINX with sticky cookie
upstream app {
sticky cookie srv_id expires=1h domain=.yourapp.com path=/;
server 10.0.0.1;
server 10.0.0.2;
}
π οΈ 5. Health Checks & Failover
Liveness and readiness checks prevent LBs from routing traffic to unhealthy nodes.
- Active checks: Ping endpoints or make synthetic requests
- Passive checks: Observe failed responses and eject nodes
// AWS-style HTTP health check
GET /healthz HTTP/1.1
Host: your-service.com
-- returns HTTP 200 if healthy
π 6. Load Balancers in Multi-tier Systems
Client β Global LB (GeoDNS / Anycast)
β Regional LB (CDN / Edge)
β App LB (L7 routing)
β Internal LB (DB / Cache sharding)
Each layer may use a different technique and tool, e.g.:
Layer | Tool |
---|---|
Global | Cloudflare, Route53, GSLB |
Regional | Fastly, Akamai, AWS CloudFront |
L7 App Balancer | Envoy, NGINX, ALB |
Internal L4 | HAProxy, NLB, gRPC round robin |
π§ 7. Trade-offs & Gotchas
Concern | Watch Out For |
---|---|
Session Affinity | Can break scalability if not managed |
Health Check Frequency | Too aggressive = false positives |
DNS TTL | Short TTL needed for dynamic IPs |
Cache Invalidation | Some LBs cache incorrectly at L7 |
Cold Starts | New containers might fail health checks |
Coordinated Omission | Donβt ignore tail latencies in tests |
π 8. Real-World Load Balancers
Tool | Type | Notes |
---|---|---|
NGINX | L7 | Popular reverse proxy & HTTP LB |
HAProxy | L4/L7 | Fast, widely used in infra |
Envoy | L7 | gRPC/native HTTP2, dynamic config |
AWS ALB/NLB | L7/L4 | Fully managed, auto-scaling |
Traefik | L7 | Modern, cloud-native, dynamic |
Kubernetes Services | L4/L7 | Built-in kube-proxy & ingress |
π§ͺ 9. Benchmarking Load Balancer Performance
To test throughput, latency, error recovery:
- Use tools like wrk, hey, vegeta, or Locust
- Simulate burst traffic and failure conditions
- Measure tail latencies (P95, P99) and error rates
# Example: 100 connections for 30s test
wrk -t10 -c100 -d30s http://localhost:8080
π§° 10. Load Balancing in Kubernetes
- Service: Basic L4 load balancer via kube-proxy
- Ingress Controller: NGINX/Traefik/HAProxy handling L7 HTTP traffic
- Service Mesh: Istio, Linkerd use Envoy sidecars to handle intra-cluster traffic
β Summary Table
Feature | L4 Load Balancer | L7 Load Balancer |
---|---|---|
Protocol Awareness | TCP/UDP only | HTTP, HTTPS, gRPC aware |
Routing Logic | IP/Port based | URL, header, cookie |
Performance | Faster | Slight overhead |
Flexibility | Lower | Very flexible |
Use Case | DBs, gRPC, raw APIs | Web apps, APIs, proxies |
π Further Reading
- NGINX Load Balancing Guide
- HAProxy Configuration Examples
- AWS ALB vs NLB Comparison
- Google SRE Book - Load Balancing
- Envoy Proxy Docs
<< back to Guides