Search code examples
nginxcachingreverse-proxynginx-reverse-proxy

Enable caching for NGINX third party proxy


I'm trying to avoid hitting a third party API too often by caching the response and avoiding the 429 responses we're getting from the API endpoint.

To accomplish this I've set up a Linode server running Ubuntu 20.04.

The config file etc/nginx/conf.d/nginx.conf is as follows

server {
        server_name myserver-name-proxy.server.com;
        access_log /var/log/access.log main;
        error_log /var/log/error.log info;

        location / {
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_cache my_cache;
                proxy_ignore_headers Cache-Control;
                proxy_cache_methods GET HEAD POST;
                proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
                proxy_cache_background_update on;
                proxy_cache_lock on;
                add_header X-Cache-Status $upstream_cache_status;
                proxy_pass https://www.thirdpartyserver.com;
                proxy_ssl_session_reuse on;
                proxy_ssl_server_name on;
                proxy_set_header X-Forwarded-Proto https;
                proxy_buffering on;
                proxy_cache_key $scheme$proxy_host$request_uri;
}
        default_type application/json;

    listen [::]:443 ssl ipv6only=on; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/myserver-name-proxy.server.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/myserver-name-proxy.server.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}

server {
    if ($host = myserver-name-proxy.server.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


        server_name myserver-name-proxy.server.com;

        listen 80;
        listen [::]:80;
    return 404; # managed by Certbot
}

Then the config file etc/nginx/conf.d/nginx.conf is

user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log notice;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent $request_time "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$upstream_cache_status" "$http_x_cache_status"';

    proxy_cache_path /var/cache/nginx/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=24h use_temp_path=off;

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

There are no errors, yet when I tail the access log files or check the headers there is only MISS coming back from NGINX as the $http_x_cache_status

So far I've tried changing the proxy_cache_path to a new folder. When restarting this folder is created by the NGINX server but nothing is ever written to it plus a number of other things like switchin goff the background update, the cache lock etc etc.

The only difference I can see between this and all the tutorials out there is that I'm using this with SSL and hitting a https:// endpoing and using the proxy_ssl_session_reuse and proxy_ssl_server_name in the settings.


Solution

  • There were two issues with this.

    Firstly, the response from the upstream server was not returning any expire headers so NGINX will not cache those items. I was also setting proxy_ignore_headers Cache-Control;.

    To ensure that NGINX sets cache on these requestes you need to include proxy_cache_valid.

    This worked for one endpoint, but my other endpoint was still failing.

    The second reason was because the origin/upstream server was responding with a Set-Cookie header which meant the response was not being cached. To fix this I needed to also add Set-Cookie to the proxy_ignore_headers.

    My final rule set was

           location / {
                    proxy_pass https://www.thirdpartyserver.com;
                    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                    proxy_cache nft_cache;
                    proxy_ignore_headers Cache-Control Set-Cookie;
                    proxy_cache_methods GET HEAD POST;
                    proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504 http_429;
                    proxy_cache_background_update on;
                    proxy_cache_lock on;
                    add_header X-Cache-Status $upstream_cache_status;
                    proxy_ssl_session_reuse on;
                    proxy_ssl_server_name on;
                    proxy_ssl_verify off;
                    proxy_set_header X-Forwarded-Proto https;
                    proxy_buffering on;
                    proxy_cache_key $scheme$proxy_host$request_uri;
                    proxy_cache_valid 10080m;
    

    One other update I made was for the server that was caching the POST requests. Because the URL never changes you should update the proxy_cache_key to also include the $request_body

    proxy_cache_key $scheme$proxy_host$request_uri$request_body;

    This means that you can hit the same post endpoint with different request bodies and know that you're going to catch the right responses (this was done using a GraphQL endpoint, not for posting forms).

    --Edit--

    I noticed that on some POST requests the cache was being skipped again. It turned out that this was because the proxy_buffer size wasn't large enough to contain the request. I also had to include

    proxy_buffers 8 32k;
    proxy_buffer_size 64k;