I am doing the multi-core computing in R. I am
Here are the code and outputs for each of the computation. Why the elapsed time increases as the number of cores increases? This is really counter-intuitive. I think it is reasonable that the elapsed time decreases as the number of cores increases. Is there any way to fix this?
Here is the code:
library(parallel)
detectCores()
system.time(pvec(1:1e7, sqrt, mc.cores = 1))
system.time(pvec(1:1e7, sqrt, mc.cores = 4))
system.time(pvec(1:1e7, sqrt, mc.cores = 8))
Thank you.
Suppose that your data is divided into N parts. Each part of your data is calculated in T seconds. In a single core architecture you expect all operations will be done in N x T seconds. You also hope that all of the works should be done in T times in an N cores machine. However, in parallel computing, there is a communication lag, which is consumed by each single core (Initializing, passing data from main to child, calculations, passing result and finalizing). Now let the communication lag is C seconds and for simplicity, it is constant for all cores. So, in an N cores machine, calculations should be done in
T + N x C
seconds in which the T part is for calculations and N X C part is for total communications. If we compare it to single core machine, the inequality
(N x T) > (T + N x C)
should be satisfied to gain a computation time, at least, for our assumptions. If we simplify the inequality we can get
C < (N x T - T) / N
so, if the constant communication time is not less than the ratio (N x T - T) / N we have no gain to make this computation parallel.
In your example, the time needed for creation, calculation and communication is bigger than the single core computation for function sqrt.