Search code examples
javascalaparallel-processingwork-stealing

Work-stealing parallel job doesn't seem to steal much work


The idea is to run a parallel job on a 96-cores machine, with a work stealing ForkJoinPool.

Below is the code I'm using so far:

import scala.collection.parallel.ForkJoinTaskSupport
import scala.concurrent.forkjoin.ForkJoinPool

val sequence: ParSeq[Item] = getItems().par
sequence.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool())
val results = for {
  item <- sequence
  res   = doSomethingWith(item)
} yield res

Here, sequence has about 20,000 items. Most items take 2-8 seconds to process, and only about 200 of them take longer, around 40 seconds.

The problem:

Everything runs fine, however, the work-stealing aspect doesn't seem to work well. Here are the expected total CPU load (black) compared to the actual load (blue) over time:

Expected vs Actual work loads

When looking at the CPU activity, it's very clear that less and less cores get used as the job is progressing towards the end. During the last 10 few minutes, only 2 or 3 cores are still busy processing dozens of items sequentially, one after the other.

How comes that the items still in the queue don't get stolen by the other free cores, even when using a ForkJoinPool, which is supposed to be work-stealing?

https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html


Solution

  • Each worker thread has its internal task queue, which is protected from work stealing from other threads to limit interactions between workers.

    This probably explains the behavior you're seeing, especially if the occurrences of long task in your item set isn't random.