Search code examples
rforeachhpcdoparallel

Do R sessions always use one CPU core unless specifically instructed otherwise?


On occasions where heavy compute is required, I've used doParallel package to dispatch work over multiple cores. Random example:

  if (detectCores()-1 > 1) {
    cl <- makeCluster(detectCores()-1)
    registerDoParallel(cl)

    tdm <- DocumentTermMatrix(corpus, control = list(dictionary = Terms(tdm), removePunctuation = TRUE, stopwords = TRUE, stemming = TRUE, removeNumbers = TRUE)) 

    stopCluster(cl)
  }

But the vast majority (probably 99.5%) of R code I write is not wrapped in additional code that explicitly spread the work across >1 cores.

Is it fair to assume this code is running across 1 single core? Or would answering this require delving into each library used, and its functions (e.g. tidyverse, data.table etc)?

Note: aside from some timed experiments, I do not know a lot about how R and the hardware interacts, so if my understanding if flawed (e.g. wrong assumptions), please point out.

Background

The reason this is of great interest is help decide between fewer cores at higher clock speed vs more cores at lower clock speed; al la the latest macbook. It would be unfortunate to pay more for a 'better' processors, only to have most day-to-day R tasks run slower due to the slower clock speed (presuming they're running only on one core).

enter image description here


Solution

  • Copying and pasting from a Slack discussion we just had on this in another place:

    • This is somewhat wrong / it depends / maybe too narrow a view.

    • For starters, R itself even uses a little OpenMP (on platforms where it can).

    • Next up, you can pick BLAS that do all your matrix math in parallel.

    • Next up is client code that can be multi-threaded and often is; package data.table is a great and famous example.

    • And maybe (at least to me) also importantly: if I set options("Ncpu"=6) on my six core desktop, I get install.packages() to install six packages in parallel.

    • Same for make -j ...

    I say a little more on R and parallel computing (at different levels) in this arXiv preprint now out in this (paywalled) WIREs article.

    Now, lastly, you say macOS. That has a slew of other difficulties with OpenMP for which you should peruse the r-sig-mac list and maybe other repo (again, data.table covers that). I don't use macOS so I cannot say much more---other than that I see a lot of people having issues.

    Lastly, of course, and not to take away from it: yes, R's inner interpreter is single-threaded and will remain so. But that does not mean we should rush out and by single-core computers. You will get some benefits from more cores, but exactly how much depends critically on the workloads.