Search code examples
nginxiowait

Nginx constant writes causes CPU I/O wait


I'm running nginx/1.20.1 on a G9 CentOS 7 machine for serving static video files with the following specs:

  • 32 cores of CPU
  • 32GB of RAM
  • 6TB of HDD storage

Nginx config:

user root;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;
worker_rlimit_nofile 30000;
events {
    worker_connections 2024;
    use epoll;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile            on;
    directio            16m;
#    output_buffers     2 32m;
#    aio                        threads;
    sendfile_max_chunk 512k;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   120;
    types_hash_max_size 2048;
    # allow the server to close connection on non responding client, this will free up memory
    reset_timedout_connection on;
    # request timed out -- default 60
    client_body_timeout 60;
    # if client stop responding, free up memory -- default 60
    send_timeout 30;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;
    client_max_body_size 200m;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    include /etc/nginx/conf.d/*.conf;

}

conf.d:

server{
    listen 80;
    server_name  mydomain.com;
    charset utf-8;
    sendfile   on;
    tcp_nopush on;

    fastcgi_read_timeout 600;
    client_header_timeout 600;
    client_body_timeout 600;
    client_max_body_size 0;

    access_log /var/log/nginx/static.access_log main;
    error_log  /var/log/nginx/static.error_log error;


   location / {
     proxy_pass http://localhost:7070;
     proxy_http_version 1.1;
     proxy_set_header Connection "";
    }

    # prevent nginx from serving dotfiles (.htaccess, .svn, .git, etc.)
    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }


}

server {
    set $base_path "/mypath";
    set $news_video_path "/mypath2";
    listen 7070;
    server_name localhost;
    location ~ /upload/videos/(.*) {
        alias $news_video_path/$1;
    }

    location ~ /video/(.*) {
        alias $base_path/video/$1;
    }


    access_log /var/log/nginx/localhost.access_log main;
    error_log  /var/log/nginx/localhost.error_log error;
}

The problem is when the nginx process starts, the CPU load average increases as well till it reaches 100% of usage. I used the htop to see what process is consuming the CPU and there was no such process. Then I head out to our monitoring dashboard and found out it's the I/O Wait that causes a High Load Average:

enter image description here

Then used the iotop to see which process has an IO wait time:

enter image description here

The strange thing is that Nginx worker processes have a high Disk Write rate. Sometimes the Total DISK WRITE reaches the number 100MB/s but the Actual Disk Write doesn't have the same behavior. I also should mention that I don't use the Nginx Cache so these write operations are not related to caching. Disabling the Nginx logging didn't help either.

How can I debug it? Why is nginx writing that much data on Disk?


Solution

  • The problem was the missing Nginx multi_accept directive. Since we were serving video files and they were typically huge by size, Nginx couldn't respond to new connections if it was serving video files to some users.

    Adding multi_accept on to the events block solved the issue.

    events {
        worker_connections 1024;
        multi_accept       on;
        use epoll;
    }