java parallel-processing java.util.concurrent

Why does this ConcurrentHashMap stream only run half of entries at a time?

I'm mystified by this curiosity. (I'm using ConcurrentHashMap rather than ConcurrentSkipListSet because the class doesn't implement Comparable.) I've got plenty of free CPUs on the computer and there is no difference between the classes that are run in the stream (other than random number generation). It's suspicious that the even numbers run first (consistently).

Here are the code and output with nRuns=10. I would expect all 10 threads to fire up and run simultaneously (as they usually do in my other uses of ConcurrentHashMap). Could it be due to some static code in LIBSVM that gets called by SvmCrossValidator? That's all I can think of. It seems to me from a basic Java perspective this stream should launch all 10 processes at once.

// instantiate and run nRuns times
ConcurrentHashMap<Integer,SvmCrossValidator> scvMap = new ConcurrentHashMap<>();
for (int i=0; i<nRuns; i++) {
    scvMap.put(i, new SvmCrossValidator(param, nrFold, inputFilename, nCases, nControls));
}
// parallel stream
scvMap.entrySet().parallelStream().forEach(entry -> {
        System.err.println("SVM run "+entry.getKey()+" started.");
        entry.getValue().run();
        System.err.println("SVM run "+entry.getKey()+" finished.");
    });

Output:

SVM run 2 started.
SVM run 0 started.
SVM run 6 started.
SVM run 4 started.
SVM run 8 started.

LONG wait here while this first five grind away...

SVM run 8 finished.
SVM run 9 started.
SVM run 6 finished.
SVM run 7 started.
SVM run 0 finished.
SVM run 1 started.
SVM run 2 finished.
SVM run 3 started.
SVM run 4 finished.
SVM run 5 started.
SVM run 9 finished.
SVM run 1 finished.
SVM run 7 finished.
SVM run 3 finished.
SVM run 5 finished.

Solution

I think 2 things affect this. Firstly add thread name to your System.out to make the worker threads clearer:

System.err.println("SVM run "+entry.getKey()+" started." +' '+Thread.currentThread().getName());

The system property java.util.concurrent.ForkJoinPool.common.parallelism affects execute queues available to ForkJoinPool - see the constructor or javadoc for ForkJoinPool.

private ForkJoinPool(byte forCommonPoolOnly)

However parallelStream() creates a spliterator which I think also makes choices based on the size of the content which also determines no of streams - regardless of size of ForkJoinPool.

Changing java.util.concurrent.ForkJoinPool.common.parallelism may not affect the outcome unless you make nRuns much bigger and it then uses more of the ForkJoinPool.commonPool-worker threads.

So with a few tests on my machine:

nRuns=10 never made use of more than 5 worker threads even with parallel=128, even re-used the same worker threads even though a lot more were available
nRuns=1000 - reached around 130 at same time with parallel=128. Note that parallelism is not the value used for number of worker threads.