Search code examples
rapacheopencpurapache

Who manages (creates, allocates memory, etc.) R sessions / R processes within OpenCPU?


We have an OpenCPU cloud server, installed on a RedHat server with Apache 2.0 and rApache, which runs some quite memory- and processing-intensive calculations. Our app runs rather slowly (slower than on a less powerful laptop) - we think this is because of the memory allocation on the server. For this reason we parallelized the app for the server (using the parallel package), but even though normally one can run many (more than 20) parallel R jobs on the server, our app can only run around 18.

In order to understand what is going on, my question is: when I call an R function through the OpenCPU web interface, which component of the server creates/spawns R processes and manages their memory allocation? Is it r_mod or the Apache server itself, through some other modules? Does the Prefork MPM have an effect on this (based on this answer)? Which part of this work is done by OpenCPU?

I read the OpenCPU documentation, rApache documentation, all stackoverflow questions on OpenCPU, but I didn't manage to understand how R processes are managed in particular. Sorry if I missed something, I'd be really grateful if anybody could point me to the source of this information.


Solution

  • The slowness can be a result of application requiring packages that are not being preloaded, hence they need to get loaded for each request, over and over again.

    To speed things up, try adding your package to the preload in /etc/opencpu/server.conf or add preprocessing R code to /etc/opencpu/Rprofile that loads the required packages / data.

    Answering your question:

    • Apache2 prefork maintains a pool of worker processes. The size of the pool n is configurable in Apache using StartServers, MinSpareServers, MaxSpareServers, MaxRequestWorkers, and so on. Because each R worker uses a lot of resources this shouldn't be set too high.
    • Upon start, each apache2 worker process starts a new private R process. Each R process then loads the opencpu package and it's dependencies, preload packages, and runs /etc/opencpu/Rprofile. Hence in total it uses n times the amount of memory it take to load those things in R.
    • Each request gets executed in a random worker, within a temporary sandbox fork. If the request requires R packages that are not preloaded, those have to be loaded on demand. This makes the requests slow.
    • Once the request is completed, temporary sandbox fork is killed and the worker is cleaned up.