Rate limiting in the built-in HTTP server of Rserve?

I'm looking into the built-in HTTP server of Rserve (1.8.5) after modifying .http.request() from FastRWeb. It's fine with the updated request function but the issue is, whenever # concurrent requests are high, some/most of them throw the following error.

WARNING: fork() failed in fork_http(): Cannot allocate memory

WARNING: fork() failed in Rserve_prepare_child(): Cannot allocate memory

This is due to there's not enough free memory remaining and it is necessary to limit # requests in one way or another.

I tried a couple of client layers (1) Python's requests + hug libraries, (2) Python's pyRserve + hug libraries where # worker processes are adjusted by # CPUs. Also I tried reverse proxy with Nginx both in a single/multiple container setup (3) (4).

In all the cases, I observe some overhead (~ 300 - 450 ms) compared to the setup of only Rserve with the built-in HTTP server.

I guess using it as it is would be the most efficient option but I'm concerned that it just keeps trying to fork and returns an error. (Besides errors are quickly thrown, it wouldn't be easy to auto-scale with some typical metrics such as CPU utilization or mean response time.)

Can anyone inform if there is a way to enforce rate limiting with/without relying on another tool, which doesn't sacrifice performance?

My Rserve config is roughly as following.

http.port 8000
socket /var/rserve/socket
sockmod 0666
control disable

Also here is a simplified nginx.conf.

worker_processes auto;

events {
    worker_connections 1024;
}

http {
    upstream backend {
        server 127.0.0.1:8000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://backend;
        }
    }
}

Solution

I was misguided by Locust (load testing tool) that it showed cached output for the setup of Rserve with the built-in HTTP server.

Manual investigation shows Rserve + Nginx returns a slightly improved result.