Search code examples
rparallel-processingdoparallel

How R's parallel package would handle 12th Gen Efficiency Cores?


I was planning to do a desktop build with an i7-12700, I mostly need it for high CPU usage R tasks and most of the time parallel processing comes into play. But I wonder how the new Efficiency Cores in 12th Gen Intel processors handle this task. I am not an expert on hardware/architecture so the question might sound stupid, but my concerns are the following.

  1. Suppose I have a task that I want to split over 8 cores, How do I ensure that the task would be run on the performance cores, or is this automatically handled by the OS?
  2. In general with other CPUs I usually set the mc.cores = Number of cores on the machine, Is this the correct way or should we put the number of threads there?

So basically might there be a concern choosing the new 12th gen CPU with efficiency cores over 11th Gen CPU say i7-12700 over i7-11700?

Update

So I bought the i7-12700 and tested it myself.

First I tested the default loading order of parallel jobs using code below and by increasing the num_cores from 8 to all the way to 20

And the order came out to be 1st thread of each performance core > efficiency core thread > second thread of each performance core when first is loaded already

library(parallel)
partask <- function(x) for(idx in 1:x) a<-idx
num_cores <- 8
mclapply(rep(5(10^8),num_cores),partask,mc.cores=num_cores)

Then I specifically put one thread from each of the performance cores under load using R script below

writeLines("for(idx in 1:(10^11)) a<-idx","sclong.R")
for(thread in c(2*(0:7))) system(paste0("taskset -c ",thread," Rscript sclong.R"),ignore.stdout=TRUE,wait=FALSE)

And then benchmarked a task on first a free thread from a performance core and then on a efficiency core thread using code

writeLines("task <- function(x) for(idx in 1:(x)) a<-idx ; microbenchmark::microbenchmark(task(10^7))","scsmall.R")
system("taskset -c 1 Rscript scsmall.R")
system("taskset -c 16 Rscript scsmall.R")

The result was

> system("taskset -c 1 Rscript scsmall.R")
Unit: milliseconds
       expr      min       lq     mean   median       uq      max neval
 task(10^7) 406.9761 416.4491 426.7444 419.0301 437.0794 464.5673   100
> system("taskset -c 16 Rscript scsmall.R")
Unit: milliseconds
       expr      min       lq    mean   median       uq      max neval
 task(10^7) 422.1205 427.8711 436.172 430.5203 443.5989 463.9686   100

So contrary to the default loading order, if all threads on performance cores had been loaded first and then move to the efficiency core's threads, it would have been more efficient.

So providing the affinity.list parameter to mclapply can be useful in some specific cases.


Solution

  • Processor affinity

    Depending upon your operating system, you can tell the OS to not use (or deprioritize) certain cores for certain processes. This is typically referred to as processor affinity. Here is some discussion of why you might (not) want to do this. The parallel package appears to have a built-in function to do this directly from R, although beware incorrect indexing of the cores.

    Windows

    You can assign specific processes to specific cores in Windows using the Processor Affinity options. Here's an example of how to do this.

    Apple

    Here's basically the same question for Apple operating systems.

    Linux

    No experience with this but here's discussion of how to do it.

    Number of cores/threads

    "Should" in this case is very subjective and so best avoided in questions. However, I'll take a stab: in many cases, "should" is precluded by "cannot" do so with respect to multithreading. I recommend reading this for discussion of multi-threading and multi-processing and how to do it (in R). Note that independent of this, many recommend specifying at least 1 core less than the total number of cores on your personal machine to avoid possible crashed or massive degradation of ability to get other stuff done while R is chugging away.