Search code examples
rforeachparallel-processingparallel.foreachparallel-foreach

CPU usage when using foreach in R


I was using foreach to do parallel compuation in R,

no_cores <- detectCores() 
registerDoParallel(no_cores)

    temp <- foreach(i=320:530,
                              .combine = rbind) %dopar% {
                                track(data = data[i,], para = currenttime)
                              }

but I realised that some CPU cores were not being utilised, let alone full used.

enter image description here

Are there some setting I missed? Are there some solutions I think about improving usage rate to speed up the running?


Solution

  • Some thoughts on this:

    • You may only have 4 physical cores but 8 logical cores because hyperthreading is enabled on your computer. Your problem may only be able to make good use of 4 cores. If so, you might be getting worse performance by starting 8 workers. In that case, it may be better to use:

      no_cores <- detectCores(logical=FALSE)

    • track may not be very compute intensive, possibly due to excessive I/O or memory operations, causing it to not use much CPU time.

    • If track is CPU intensive but doesn't take much time to execute (less than a millisecond, for example), the master process may become a bottleneck, especially if track returns a lot of data.

    Possible solutions:

    • Verify that your computer has enough memory to support the workers that you start by using your computer's process monitoring tools. If necessary, reduce the number of workers to stay within your resources.

    • You might get better results by using chunking techniques so there is only one task per worker. This makes the workers more efficient and reduces the post-processing done by the master.

    • Try experimenting with foreach options such as .maxcombine. Setting it to be greater than the number of tasks may help.

    • Combining the results by row isn't as efficient as combining by column, but this may not be a problem if you're chunking.