In R 3.0.2 on Linux 3.12.0, I am using the system()
function to execute a number of tasks. The desired effect is for each of these tasks to run as they would if I had executed them on the command-line via Rscript outside of R system()
.
However, when executing them inside R via system()
, each task is tied to the same single CPU from the master R process.
In other words:
When launched via RScript directly from a bash shell, outside of R, each task runs on its own core as possible (this is desired)
When launched inside R via system()
, each task runs on the same single core. There is no multicore sharing. If I have 100 tasks, they are all stuck on one core.
I cannot figure out how to spawn a process inside of R so that each process will use its own core.
I am using a simple test to consume CPU cycles so I can measure the effect using top
/htop
:
dd if=/dev/urandom bs=32k count=1000 | bzip2 -9 >> /dev/null
When this simple test is launched outside of R multiple times, each iteration gets its own core. But when I launch it inside of R:
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
They are all stuck on a single core.
Here is a visualization after running 4 simultaneous/concurrent iterations of system()
.
Please help me, I need to be able to tell R to launch new tasks, with each of them running in their own core.
UPDATE DEC 4 2013:
I tried a test in Python using this:
import thread
thread.start_new_thread(os.system,("/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000",))
I repeated the new thread several times, and as expected everything worked (multiple cores used, one per thread).
So I think install the rPython
package in R, and try the same from within R:
python.exec("import thread")
python.exec("thread.start_new_thread(os.system,('/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000',))")
Unfortunately, once again it was limited to a single core even after repeated calls. Why is it that everything launched is limited to a single core when executed from R?
Following on @agstudy's comment, you should get parallel
to work first. On my system, this uses multiple cores:
f<-function(x)system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
library(parallel)
mclapply(1:4,f,mc.cores=4)
I would have wrote this in a comment myself, but it is too long. I know you have said that you have tried the parallel
package, but I wanted to confirm that you are using it correctly. If it doesn't work, can you confirm that a non-system call uses mclapply
correctly, like this one?
a<-mclapply(rep(1e8,4),rnorm,mc.cores=4)
Reading your comments, I suspect that your pthreads
Linux package is out of date and broken. On my system, I am using libpthread-2.15.so (not 2.13). If you're on Ubuntu, you can grab the latest with apt-get install libpthread-stubs0
.
Also, note that you should be using parallel
, not multicore
. If you look at the docs for parallel
, you'll note that they have incorporated the work on multicore
.
Reading your next set of comments, I must insist that it is parallel
and not multicore
that has been included in R since 2.14. You can read about this on the CRAN Task View.
Getting parallel
to work is crucial. I previously told you that you could compile it directly from source, but this is not correct. I guess the only way to recompile it would be to compile R from source.
Can you also verify that your CPU affinity is set correctly? Also can you check if R can detect the number of cores? Just run:
library(parallel)
mcaffinity()
# Should be c(1,2,3,4) for you.
detectCores()
# Should be 4 for you.