High memory usage in OpenCPU

R requires CPU more than anything else so it is recommended to pick one of the newer generation compute optimized instance types, preferably with a SSD disk.

I've recently run into a problem with high memory usage (quickly raising to 100%) during load testing. To reproduce: there is an R package for which processing time is UP TO 0.2 in no-stress conditions. If I'm trying to query one of the endpoints using curl for 1000 jsons on 3 machines in parallel all of the memory is suddenly used which results in 'cannot fork' or:

cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory' In call: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)

The setup is 2x AWS 8GB CPU-optimized servers + load balancer all in private network. HTTPS is enabled and my main usage is online processing of requests so I'm mostly querying /json endpoints.

Do you happen to have any suggestions on how to approach this issue? The plan is to have more packages installed (more online processes requesting result from various functions) and don't want to end up having 32GB RAM per box.

All of the packages are deployed with such options:

LazyData: false
LazyLoad: false

They are also added into serverconf.yml.j2 - preload section. RData files are loaded within an onLoad function by calling utils::data.

Also, keeping in mind that I'm using OpenCPU without github and only one-way communication (from backend to ocpu box) which options do you suggest to turn on/optimize? It's not clearly stated in the docs yet.

Solution

It mostly depends on which packages you are using and what you are doing. Can you run the same functionality that you are invoking through opencpu locally (in the command line) without running out of memory?

Apache2 prefork creates worker processes to handle concurrent requests. Each of these workers contains an R process with all preloaded packages. So if one request would take 500mb, the total memory consumption on the server is n * 500 where n is the number of workers that are loaded.

Depending on how many concurrent requests you expect, you could try lowering StartServers or MaxRequestWorkers in your apache2 config.

Also try raising (or lowering) the option rlimit.as in the file /etc/opencpu/server.conf which limits the amount of memory (address space) a single process is allowed to consume.