<< back to Guides

Guide to Optimize a NGINX + PHP-FPM Setup in Kubernetes

This guide addresses the deployment configurations for Kubernetes and the NGINX + PHP-FPM interaction optimizations, ensuring performance, scalability, and reliability.

Deployment Optimization

1. Separate NGINX and PHP-FPM into Independent Deployments

Why: Allows independent scaling and resource tuning.
How: Create separate Deployment objects for NGINX and PHP-FPM with dedicated requests and limits.

# NGINX Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: nginx
        image: nginx:stable
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi

---
# PHP-FPM Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-fpm
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: php-fpm
        image: php:fpm
        resources:
          requests:
            cpu: 500m
            memory: 256Mi
          limits:
            cpu: 1
            memory: 512Mi

2. Scale Independently with HPA

Use Horizontal Pod Autoscaler to scale each component based on its specific resource or custom metrics.

General Guidelines:

1. NGINX Scaling Metric: Request Rate or Connections

Reason: NGINX acts as the entry point, handling HTTP requests and connections. Scaling based on these metrics ensures sufficient capacity to manage incoming traffic.
Recommended Metrics:
- Requests per second (RPS): Tracks the number of HTTP requests handled by NGINX.
- Active connections: Monitors the concurrent client connections.
Kubernetes Implementation:
- Use custom metrics like nginx_ingress_controller_requests or nginx_http_connections_active (via Prometheus).

metrics:
- type: Pods
  pods:
    metricName: nginx_ingress_controller_requests
    target:
      type: AverageValue
      averageValue: 100

2. PHP-FPM Scaling Metric: CPU or Memory Utilization

Reason: PHP-FPM processes dynamic content, which is often CPU or memory-intensive. These resources are good indicators of how well PHP-FPM is managing workload.
Recommended Metrics:
- CPU Utilization: Tracks CPU stress caused by PHP scripts.
- Memory Utilization: Monitors memory used by PHP processes, including caches and worker threads.
Kubernetes Implementation:

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70

Understanding Contradictions and Particular Needs

While the general guidelines work in most cases, specific application workloads and traffic patterns can necessitate alternative scaling strategies. Here’s how to assess and adapt:

NGINX Scaling Considerations

High RPS but Low Connection Persistence:
- If your application serves many small, quick requests (e.g., a static file server), scaling by RPS works well.
- For very persistent connections (e.g., WebSockets), scaling by active connections is better.
Custom Workload:
- For APIs with uneven traffic spikes, scale based on latency (e.g., response time metrics).
Edge Case Example:
- A WebSocket-heavy app might only need a few NGINX replicas, even with many users, as persistent connections reduce request overhead.

PHP-FPM Scaling Considerations

High Concurrency with Low Resource Per Request:
- If PHP processes many lightweight requests (e.g., REST APIs with minimal computation), you may scale PHP-FPM based on active processes instead of CPU or memory.
Memory Leaks or Inefficient Scripts:
- If poorly optimized PHP scripts cause memory bloat, memory utilization may not indicate real scaling needs. Investigate script efficiency instead of adding replicas.
Batch or Background Processing:
- When PHP is used for long-running tasks (e.g., report generation), scale based on queue length (number of jobs pending) rather than resource usage.

How to Identify Particular Needs

Analyze Application Behavior:
- Use tools like Prometheus + Grafana or NGINX Amplify to gather metrics:
  - Response time
  - Error rates
  - Resource utilization
- Compare metrics to identify bottlenecks.
Profile Traffic Patterns:
- Is the workload spiky (e.g., event-driven traffic)?
- Is there a high proportion of static vs. dynamic content?
- Does your application use long-lived connections (e.g., WebSockets)?
Perform Load Testing:
- Simulate real-world traffic using tools like k6 or wrk to understand scaling needs under pressure.
Iterative Tuning:
- Start with general scaling rules and gradually refine targets based on observed behavior.

Examples of Contradictions in Practice

Case 1: A PHP-FPM deployment for a chat app uses persistent WebSocket connections. Despite high traffic, scaling PHP-FPM by CPU fails because the script is I/O-bound. Scaling by the length of message queues or open connections may be more relevant.
Case 2: NGINX in an API gateway receives high traffic but only routes requests to microservices. CPU and memory usage remain low. Scaling by RPS ensures replicas match traffic peaks without overprovisioning resources.

By understanding these nuances, you can tailor scaling strategies to your workload while optimizing costs and performance.

3. Service Communication

Use a Kubernetes Service to connect NGINX with PHP-FPM:
- NGINX talks to the PHP-FPM Service (e.g., http://php-fpm.default.svc.cluster.local).
Example Service for PHP-FPM:

apiVersion: v1
kind: Service
metadata:
  name: php-fpm
spec:
  selector:
    app: php-fpm
  ports:
  - protocol: TCP
    port: 9000 # PHP-FPM default
    targetPort: 9000

NGINX Configuration

1. Enable Keep-Alive Connections

Optimize persistent connections between NGINX and PHP-FPM to reduce overhead.
Example in nginx.conf:

http {
    upstream php-fpm {
        server php-fpm.default.svc.cluster.local:9000;
        keepalive 32; # Number of persistent connections
    }

    server {
        location ~ \.php$ {
            fastcgi_pass php-fpm;
            include fastcgi_params;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            fastcgi_keep_conn on;
        }
    }
}

Note: Pros and Cons of Keep-Alive Connections

Pros:

Reduced Latency: Eliminates the overhead of establishing a new TCP connection for each request, speeding up interactions between client and server.

Lower Resource Usage: Saves CPU and memory by avoiding repeated handshakes (e.g., TLS/SSL negotiation).

Improved Throughput: Multiple requests can share a single connection, leading to better utilization of network resources.

Better User Experience: Faster response times and smoother browsing for users.

Cons:

Higher Resource Consumption on the Server: Long-lived connections may occupy server memory and connection slots, reducing capacity for new clients.

Potential for Overloading: If keep-alive timeouts are too long, idle connections can overwhelm the server, especially under high traffic.

Requires Fine-Tuning: Needs optimal timeout and connection limit configurations to avoid inefficiencies.

Compatibility Issues: Older or non-standard clients may not fully support keep-alive, leading to inconsistencies.

When to Use:
Keep-alive is ideal for scenarios with frequent and repeated client-server interactions (e.g., APIs or websites with many assets). Avoid or limit keep-alive for low-traffic, infrequent interactions where connections are likely to sit idle.

2. Buffering

Use fastcgi_buffering to improve response time for dynamic content:

fastcgi_buffer_size 16k;
fastcgi_buffers 4 16k;
fastcgi_busy_buffers_size 32k;
fastcgi_temp_file_write_size 32k;

Note: Pros and Cons of FastCGI Buffering

Pros:

Improved Performance: Buffers responses in memory before sending them to the client, reducing I/O overhead and improving throughput.

Better Handling of Slow Clients: Allows NGINX to quickly fetch responses from the backend and hold them in memory, freeing up the FastCGI backend for other requests.

Reduced Backend Load: Limits the time FastCGI processes spend serving responses directly, enhancing scalability under heavy traffic.

Optimized Resource Usage: Useful for handling large, static-like dynamic content (e.g., large HTML pages).

Cons:

Increased Memory Usage: Buffering requires memory allocation, which can grow significantly with high traffic or large responses.

Latency for Streaming Content: Adds latency when serving real-time or streaming data, as the response is buffered before being sent to the client.

Complex Tuning Required: Requires careful adjustment of buffer sizes (fastcgi_buffers, fastcgi_buffer_size) to avoid wasted resources or unintended performance issues.

Not Always Needed: For small, quick responses, buffering can introduce unnecessary overhead.

When to Use:
Enable FastCGI buffering for most scenarios, especially when handling dynamic content or high traffic. Consider disabling buffering for low-latency, streaming, or real-time applications where immediate delivery of data is critical.

3. Static File Serving

Offload static file requests directly to NGINX instead of routing them to PHP-FPM:

location / {
    try_files $uri $uri/ /index.php?$query_string;
}

location ~* \.(jpg|jpeg|png|css|js|ico|gif|svg|woff2|woff|ttf|eot|otf)$ {
    expires 1y;
    add_header Cache-Control "public, no-transform";
    access_log off;
}

Note: Pros and Cons of Static File Serving

Pros:

Improved Performance: Serving static files directly from NGINX is much faster than processing them through PHP or other backend services, reducing response times.

Low Resource Usage: Static files do not require server-side processing, freeing up resources for dynamic content handling.

Efficient Caching: Static files can be easily cached at multiple levels (e.g., in browser cache, CDN, or NGINX), reducing load on the server and improving delivery speed.

Better Scalability: Serving static content through NGINX reduces the need for scaling backend services, as static content doesn't put additional load on application servers.

Cons:

Limited Flexibility: Static file serving doesn’t allow dynamic behavior (e.g., user personalization), and it can’t handle content changes without updating the file itself.

Potential for Stale Content: Without proper cache control, clients may receive outdated content if the cache is not refreshed or invalidated.

File Management Complexity: Managing static assets (e.g., versioning, minification, or compression) requires additional setup and maintenance.

When to Use:
Serve static files through NGINX whenever possible, especially for assets like images, CSS, and JavaScript. Use cache headers and CDNs to further optimize delivery. Avoid serving static content through PHP or other dynamic backends to improve performance and reduce server load.

4. Rate Limiting

Prevent overloading PHP-FPM with rate limits:

limit_req_zone $binary_remote_addr zone=php_limit:10m rate=10r/s;
location ~ \.php$ {
    limit_req zone=php_limit;
}

Note: Pros and Cons of Rate Limiting

Pros:

Prevents Abuse: Protects your server from excessive requests, which could overwhelm the system, by limiting the number of requests from a client within a specific time frame.

Improved Reliability: Ensures fair usage of resources, preventing any single user from consuming too many resources and negatively affecting others.

Mitigates DDoS Attacks: Helps mitigate simple Denial of Service (DoS) or Distributed Denial of Service (DDoS) attacks by controlling the rate of incoming traffic.

Cons:

Can Block Legitimate Users: If set too aggressively, rate limits might block legitimate traffic, especially from users with varying request patterns.

Requires Fine-Tuning: Needs proper configuration to balance between protecting resources and ensuring a smooth user experience.

May Introduce Latency: For APIs or services with strict rate limits, users might experience delays if they hit the limit too quickly.

When to Use:
Implement rate limiting for services that are exposed to the internet, such as APIs or login pages, to prevent abuse and maintain performance under high load.

PHP-FPM Configuration

1. Pool Configuration

Tune PHP-FPM pools for resource usage and concurrency (/usr/local/etc/php-fpm.d/www.conf):

pm = dynamic
pm.max_children = 50     # Max simultaneous requests
pm.start_servers = 10    # Initial number of workers
pm.min_spare_servers = 5 # Minimum idle workers
pm.max_spare_servers = 15 # Maximum idle workers

2. Process Management

Monitor and optimize memory usage:

pm.max_requests = 500    # Restart workers after processing 500 requests

Use pm.status_path for insights into PHP-FPM performance:

pm.status_path = /status

3. Enable OPcache

Use OPcache for PHP performance improvements:

opcache.enable=1
opcache.memory_consumption=128
opcache.interned_strings_buffer=8
opcache.max_accelerated_files=4000

Monitoring and Debugging

1. NGINX Metrics

Enable NGINX Status for Prometheus scraping:

location /nginx_status {
    stub_status;
    allow 127.0.0.1; # Allowlist
    deny all;
}

2. PHP-FPM Metrics

Monitor PHP-FPM status via the /status endpoint (use a tool like Prometheus exporter for PHP-FPM).

3. Log Optimization

Rotate and optimize logs:
- Use minimal logging levels for NGINX (error_log /var/log/nginx/error.log warn;).
- Log slow requests in PHP-FPM:

request_slowlog_timeout = 5s
slowlog = /var/log/php-fpm-slow.log

Testing and Iterative Improvement

1. Load Testing:

Use tools like wrk or k6 to simulate traffic and evaluate scaling behavior.

2. Observability:

Use Prometheus and Grafana for resource monitoring.
Track metrics like:
- NGINX: Requests per second, connections, response times.
- PHP-FPM: Active processes, memory usage, request durations.

3. Iterative Tuning:

Adjust HPA targets, buffer sizes, and pool configurations based on observed bottlenecks.

Summary of alternatives to a NGINX+PHP-FPM

Option	Best For	Pros	Cons	Reference
Apache with mod_php	Simpler setups, legacy environments	Easy to configure, no need for FPM, `.htaccess` support	Lower performance and scalability compared to NGINX + FPM	Apache
Caddy with PHP-FPM	Simple configuration and automatic HTTPS	Easy to set up, secure by default, automatic SSL/TLS	Less flexible than NGINX, smaller community	Caddy
LiteSpeed with LSAPI	High-performance PHP apps requiring scalability	High performance, supports HTTP/2, out-of-the-box optimizations	Proprietary software, licensing costs	LiteSpeed
OpenLiteSpeed	Open-source high-performance PHP apps	Free, fast, optimized for PHP, supports HTTP/2 and caching	Smaller community than NGINX/Apache	OpenLiteSpeed
PHP Built-In Server	Development or lightweight production setups	Simple to use, no server setup required	Not suitable for production, limited scalability	PHP Manual
Node.js with PHP-FPM	Integrating PHP into a Node.js ecosystem	Flexibility for real-time capabilities, can combine multiple technologies	Additional complexity, potential latency from proxying	Node.js
Docker + PHP-FPM	Microservices or containerized environments	Scalability, isolation, works well with cloud environments and Kubernetes	Docker overhead, additional complexity	Docker
Serverless PHP	Event-driven or highly dynamic applications	No infrastructure management, auto-scaling, pay-per-use	Cold start latency, limited runtime, limited control over environment	AWS Lambda

Summary of alternatives for Monitoring and Telemetry

Option	Best For	Pros	Cons	Reference
Prometheus	Time-series data, monitoring and alerting	Open-source, highly customizable, integrates with Kubernetes, large community support	Requires configuration and management, storage overhead, may require additional tools for full-stack observability	Prometheus
Grafana	Visualizing metrics from various sources	Powerful visualization and dashboarding, integrates with multiple data sources, open-source	May require integration with other tools (e.g., Prometheus) for data collection and alerting	Grafana
Datadog	Full-stack monitoring and APM (Application Performance Monitoring)	Comprehensive monitoring, real-time alerting, cloud-based, easy to set up	Paid service, pricing can scale with usage, cloud vendor lock-in	Datadog
New Relic	APM, monitoring for web apps and microservices	Detailed application performance insights, real-time monitoring and alerting	Paid service, can become expensive, requires agent installation	New Relic
Elastic Stack (ELK Stack)	Log management, monitoring, and analytics	Centralized logging, advanced search and analysis, integrates with Kubernetes	Setup and management complexity, storage and performance can be an issue for large-scale environments	Elastic Stack
Zabbix	Traditional monitoring for infrastructure	Open-source, supports wide range of monitoring types, can be highly customized	May have a steeper learning curve, not as modern in terms of UI and integrations	Zabbix
InfluxDB	Time-series database for monitoring and metrics	Optimized for high-write loads, easy integration with monitoring tools like Grafana	Requires additional tools for full-stack observability, focused mainly on time-series data	InfluxDB
Azure Monitor	Microsoft Azure services, cloud infrastructure monitoring	Native integration with Azure, real-time monitoring, scalable	Limited to Azure, not as flexible as open-source solutions	Azure Monitor
Sysdig	Kubernetes, container, and cloud-native monitoring	Deep monitoring for containers and Kubernetes, integrates with Prometheus and Grafana	Paid service for advanced features, limited free tier	Sysdig
Splunk	Logs and machine data analysis	Powerful log aggregation, real-time analytics, APM capabilities	Paid service, may become expensive at scale, setup complexity	Splunk
Nagios	IT infrastructure monitoring	Widely used, open-source, highly customizable, alerting capabilities	Older tool, can be complex to configure, UI is less modern	Nagios

Glossary of Terms

Term	Description
PHP-FPM	PHP FastCGI Process Manager, a PHP implementation designed to provide faster execution by handling multiple PHP processes concurrently.
NGINX	A high-performance web server that can also be used as a reverse proxy, load balancer, and HTTP cache.
Keep-Alive	A mechanism that keeps connections open for reuse, reducing the overhead of creating new connections for each request.
FastCGI	A protocol for interfacing interactive programs with a web server, used to communicate between NGINX and PHP-FPM.
HPA (Horizontal Pod Autoscaler)	A Kubernetes resource that automatically scales the number of pods in a deployment based on observed CPU utilization or custom metrics.
Request (in Kubernetes)	Defines the minimum amount of CPU or memory a container is guaranteed to have available in Kubernetes.
Limit (in Kubernetes)	Specifies the maximum amount of CPU or memory a container can use in Kubernetes.
Pod	A group of one or more containers in Kubernetes that share storage and network resources, and a specification for how to run the containers.
Service (in Kubernetes)	A Kubernetes abstraction that defines a logical set of Pods and a policy to access them, commonly used for load balancing.
Deployment (in Kubernetes)	A resource that provides declarative updates to Pods and ReplicaSets, ensuring the desired state of applications is maintained.
Autoscaling	The process of automatically adjusting the number of active instances (pods) based on the load or other performance metrics.
Prometheus	An open-source monitoring and alerting toolkit designed for reliability and scalability, often used with Kubernetes.
Grafana	An open-source analytics and monitoring platform that integrates with Prometheus to visualize and query metrics.
Docker	A platform used to automate the deployment, scaling, and management of applications using containerization technology.
Node.js	A JavaScript runtime built on Chrome's V8 JavaScript engine, often used for building scalable network applications.
Serverless	A cloud computing execution model where the cloud provider automatically manages the infrastructure, often used for event-driven apps.
TLS (Transport Layer Security)	A cryptographic protocol designed to provide secure communication over a computer network, commonly used in web servers for HTTPS.
TLS Termination	The process of decrypting secure HTTPS traffic before it reaches the backend application, often performed by reverse proxies like NGINX.
Rate Limiting	A technique used to limit the number of requests a client can make to a service in a given period, preventing overload.
Caching	Storing copies of files or data in temporary storage locations to reduce latency and improve access speed.
Elastic Stack (ELK Stack)	A suite of tools (Elasticsearch, Logstash, Kibana) used for search, logging, and visualization, commonly used in observability setups.
Prometheus Metrics	Time-series data collected by Prometheus from configured targets (like Kubernetes pods or applications) that represent performance metrics.
Time-Series Data	Data points indexed by time, commonly used for performance monitoring, such as metrics on CPU usage, memory consumption, etc.
APM (Application Performance Monitoring)	A suite of tools for monitoring and analyzing the performance of software applications in real time.
Node Exporter	A Prometheus exporter that collects hardware and OS metrics from Linux nodes, often used with Prometheus for infrastructure monitoring.
Horizontal Scaling	Increasing the number of resources or instances (e.g., pods) to distribute load and enhance system performance.
Vertical Scaling	Increasing the resources (CPU or memory) available to a single instance, such as a pod, to handle more load.
FPM Pool	A set of PHP processes managed by PHP-FPM that handle incoming requests, with the number of processes configurable for performance.
ElasticSearch	A distributed search and analytics engine that is commonly used in observability and logging systems.
ReplicaSet	A Kubernetes controller that ensures a specified number of pod replicas are running at any given time, often part of a deployment.
Pod Disruption Budget (PDB)	A policy in Kubernetes that specifies the minimum number of pods that should remain available during voluntary disruptions like updates or scaling.
API Gateway	A server that acts as an API frontend, receiving API requests, routing them to the appropriate service, and aggregating the results.

<< back to Guides