Optaplanner: Randomly very low "average calcultate count per second" when solving in parallel

I am using Optaplanner to solve a comparatively small optimization problem. For my use case, many such optimizations are required though which is why I started to run them in parallel. The parallelity is based upon Java 8' parallel stream. It doesn't allow to control the actual number of threads to be used, but I believe it to be based on the available CPU count.

For most of the solver runs this seems to work fine, but I noticed that I sometimes got invalid solutions from a single run which were not reproducible when only that problem was run alone.

After inspecting the logs, I noticed that the "average calculate count per second" was very low for invalid solution while being fine for other runs. In fact, the invalid solution was actually the (naively built) initial solution:

[rkJoinPool.commonPool-worker-6] (DefaultSolver.java:203)       Solving started: time spent (0), best score (-5hard/-2medium/168soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
[rkJoinPool.commonPool-worker-6] (DefaultConstructionHeuristicPhase.java:158) Construction Heuristic phase (0) ended: step total (0), time spent (1), best score (-5hard/-2medium/233soft).
[rkJoinPool.commonPool-worker-4] (DefaultSolver.java:203)       Solving started: time spent (1), best score (-5hard/-1medium/579soft), environment mode (REPRODUCIBLE), random (JDK with seed 0). 
[rkJoinPool.commonPool-worker-4] (DefaultConstructionHeuristicPhase.java:158) Construction Heuristic phase (0) ended: step total (0), time spent (1), best score (-5hard/-1medium/617soft).
[rkJoinPool.commonPool-worker-5] (DefaultSolver.java:203)       Solving started: time spent (1), best score (-6hard/-3medium/137soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
[rkJoinPool.commonPool-worker-7] (DefaultLocalSearchPhase.java:152) Local Search phase (1) ended: step total (42), time spent (704), best score (0hard/0medium/808soft).
[rkJoinPool.commonPool-worker-4] (DefaultLocalSearchPhase.java:152) Local Search phase (1) ended: step total (22), time spent (218), best score (0hard/0medium/1033soft). 
[rkJoinPool.commonPool-worker-5] (DefaultSolver.java:238)       Solving ended: time spent (210), best score (-6hard/-3medium/137soft), average calculate count per second (4), environment mode (REPRODUCIBLE).
[rkJoinPool.commonPool-worker-7] (DefaultSolver.java:238)       Solving ended: time spent (746), best score (0hard/0medium/808soft), average calculate count per second (25256), environment mode (REPRODUCIBLE).
[rkJoinPool.commonPool-worker-4] (DefaultSolver.java:238)       Solving ended: time spent (219), best score (0hard/0medium/1033soft), average calculate count per second (30461), environment mode (REPRODUCIBLE).

Notice how Threads 4 and 7 produce good results with 25-30k accs, while Thread 5 produced an invalid result and only used 4 accs (given the 200ms termination timeout I assume that really only one step was taken).

The following configuration was used which was determined using the benchmarker (albeit in a single-thread setup):

<termination>
    <millisecondsSpentLimit>2000</millisecondsSpentLimit>
    <unimprovedMillisecondsSpentLimit>200</unimprovedMillisecondsSpentLimit>
</termination>
<constructionHeuristic>
    <constructionHeuristicType>FIRST_FIT</constructionHeuristicType>
</constructionHeuristic>
<localSearch>
    <localSearchType>HILL_CLIMBING</localSearchType>
</localSearch>

I assume that this problem has to do with the fact that several solvers are running in parallel while a time based termination criteria is used. Is the termination time based on "wall time" or on actual CPU time?

Is using a time based termination criteria not such a good idea when running in parallel? This seems to be the best way though to use all available computing power. What could cause as single solver to seemingly at random only perform so few steps?

Solution

millisecondsSpentLimit and unimprovedMillisecondsSpentLimit are based on wall time, not on actual CPU time.

AFAIK, parallel streams does not limit the number of threads to the number of CPU's, as those jobs might block under IO (which is not the case for Solver.solve() calls). I prefer to use an ExecutorService with a thread pool size of Math.max(1, Runtime.getRuntime().availableProcessors() - 2).