β‘ Performance Benchmarking Guide for High-Performance Databases
Designing meaningful, reproducible benchmarks is essential for tuning databases under real-world workloads. This guide outlines key principles, strategies, and pitfalls to avoid when benchmarking latency-sensitive systems like high-performance databases, caches, and analytics engines.
π― Goals
- β Tune query and traffic patterns based on workload reality
- β Measure warm-up phases, throttling, and internal database behaviors
- β Correlate latency, throughput, and resource usage
- β Produce clear, reproducible, and actionable reports
1. π§ͺ Designing Real-World Benchmarks
"If your benchmark doesn't reflect real-world usage, your tuning won't either."
β Strategies
- Use realistic queries (not synthetic SELECT 1s)
- Include read/write mixes (e.g. 80/20 or 50/50)
- Simulate production traffic patterns (e.g. bursts, idle periods)
- Include warm-up phase to account for caches, JIT, etc.
- Benchmark against real data sizes, not toy datasets
# Example: Load test with realistic query patterns
wrk -t4 -c100 -d30s --script=benchmark.lua http://dbproxy.local/query
β Pitfalls
- Measuring only during cold starts (no warm-up)
- Ignoring connection pool reuse
- Testing with unbounded workloads (causes overload)
2. βοΈ Infrastructure Sizing & Concurrency
"Always rightsize infra to your workload, not the other way around."
β Tips
- Match number of virtual users (VUs) to peak concurrency expectations
- Add latency injection to model network/GC/jitter behavior
- Measure and report CPU, memory, disk IOPS, and network bandwidth
# Example: K6 test with ramping VUs and fixed throughput
export let options = {
stages: [
{ duration: '1m', target: 50 },
{ duration: '5m', target: 100 }
],
rps: 100,
};
3. β οΈ Recognize and Mitigate Coordinated Omission
"Your benchmark might lie if you donβt simulate realistic delays."
What Is It?
If you measure response time only when you send a request, youβll miss the long delays that occur under load when the system is overloaded.
β Solutions
- Use tools like HdrHistogram that track time between intents and actual completions
- Implement scheduled request dispatch, not reactive
# Use HdrHistogram to record actual vs intended timing
recordValueWithExpectedInterval(actual_latency_ms, expected_interval_ms)
Learn More
4. π Understand Latency Distributions (Not Just Averages)
βP99 latency tells you what your worst users experience.β
β Key Metrics
Metric | Why It Matters |
---|---|
Avg latency | Misleading in non-normal distributions |
P95 / P99 | Critical for SLAs, SLOs |
Max latency | Surface worst-case stalls |
Std. Dev | Variability across requests |
# Example: Wrapping a test with latency histogram
import hdrhistogram
hist = hdrhistogram.HdrHistogram(1, 3600000, 3)
for latency in results:
hist.record_value(latency)
print("P95:", hist.get_value_at_percentile(95))
print("P99:", hist.get_value_at_percentile(99))
5. π Measure Warm-up, Throttling, and Internals
β Include:
- Warm-up phases: discard or tag first X seconds of results
- Throttling tests: apply capped RPS to observe graceful degradation
- GC pauses / compactions: log at high granularity
- Connection churn: does latency spike with reconnects?
# Warm-up example: Ignore first N seconds
start_time = time.time()
for i in range(N):
latency = run_query()
if time.time() - start_time > 30:
record(latency)
6. π Latency vs Throughput vs Resource Usage
βA high-QPS system that spikes CPU at 80% isn't fast β it's on fire.β
β Correlate:
- Latency vs CPU usage
- Throughput vs memory pressure
- Throughput vs GC/compaction
- IO Wait vs throughput under heavy writes
Use tools like:
dstat
,htop
,vmstat
,iostat
,perf
- Grafana + Prometheus dashboards
- Cloud-native: AWS CloudWatch, GCP Ops Agent
7. β Reporting Best Practices
β Always Report:
- Hardware specs (CPU model, RAM, disk type, GPU if relevant)
- DB version, schema description, indexes
- RPS / concurrency / client-side latency distribution
- Resource usage over time
- Full config (Dockerfile, test scripts, system config)
# Basic benchmark report structure
## Setup
- DB: Postgres 15
- Host: 8-core AMD, 64GB RAM
- Client: wrk, 4 threads, 100 connections
## Results
| Metric | Value |
|-----------|------------|
| Avg Lat | 5.2 ms |
| P95 | 12.1 ms |
| P99 | 21.7 ms |
| Max | 65.0 ms |
| Throughput| 12,700 RPS |
## Observations
- P99 spikes during connection churn
- IOWait correlates with spikes on write-heavy phases
β Summary Checklist
- [x] Reflect real-world read/write/query patterns
- [x] Measure warm-up and resource warm-state behavior
- [x] Account for concurrency, coordinated omission
- [x] Use histograms and P99s β not just averages
- [x] Report full system config, results, and context
π Further Reading
- Latency Numbers Every Engineer Should Know
- Gil Tene on Coordinated Omission
- HDR Histogram
- Distributed Systems Observability
<< back to Guides