I'm trying to fit +- 70.000 values as a function of two variables using the loess()
function several times. I want to use this fit to de-trend the data. My problem is that once I start the loess function, the R session takes up all available cores on the system, and that would be inconsiderate towards other users on the same computing cluster.
The relevant code would be analogous to the following:
# Approximation of the data
df <- data.frame(y = rpois(70000, rnorm(70000, 10, 2)), # y is count data
x = 50000 - rpois(70000, 100),
z = runif(70000))
# The problematic operation
fit <- loess(y ~ x + z, data = df)
When I run this example on my local machine, it only takes up 1 core, but on the cluster it takes as many cores as it could get (up to 48). Ideally, I would loess()
to run on only 1 core.
I've tried to trace any multicore parameters in the code of loess, which I couldn't find. I know that loess calls stats:::simpleLoess
, which in turn calls C code, which in turn calls Fortran code. I have no experience in C or Fortran and I haven't been able to figure out how I can restrict the CPU usage for this function.
Does anyone has any suggestion on how I can limit the CPU usage of the loess function?
I am not knowledgeable enough to comment on specifics about how all of this works, but I know that C++ and FORTRAN for R are usually built using the OpenMP framework for multi-thread programming. Empirically, I do know that your issue can be resolved if you set the OMP_NUM_THREADS
argument before you launch R or if you set it within an R session.
Let's say you wanted to use 2 threads for the loess
function. Before you launch R, you would do this ($
to signify typing this in a shell session):
$ OMP_NUM_THREADS=2 R [whatever other options you use to launch R]
Here's how to do it from within R (>
to indicate an interactive R session):
> Sys.setenv("OMP_NUM_THREADS" = 2)
If you ever need to check the variable from within R, you can do the following (this will return a character vector with the number):
> Sys.getenv("OMP_NUM_THREADS")
# The result in our example will be "2"
For completeness, be sure to use ?Sys.setenv
or ?Sys.getenv
if you wish to get more information about those functions, and check out this site for details about OMP_NUM_THREADS
.
Hope that helps!