java parallel-processing benchmarking mergesort forkjoinpool

Parallel Mergesort benchmarking - determining threshold found

I'm trying to identify what is a reasonable threshold for which to stop sub-dividing my implementation of Mergesort.

However, the results i get is that the threshold should be somewhere between 10⁷ < x < 10⁸, which is absurd given that the default threshold used by java is around 8192. It's basically telling me that sub-dividing is almost always bad, and higher thresholds are better because it perform less splits.

The work it currently does is to sort an array of floats with a size of 10⁸, and a random range of 0 to 1000. The same random array is reused for each tested threshold value.

public class ParallelMergeSort extends SortStrategy {

    @Override
    public long sort(float[] a, int cores, int threshold) {
        System.gc();
        long start = System.nanoTime();
        RecursiveAction mainTask = new SortTask(a, 0, a.length - 1);
        SortTask.threshold = threshold;
        ForkJoinPool pool = new ForkJoinPool(cores);
        pool.invoke(mainTask);
        return System.nanoTime() - start;
    }

    private static class SortTask extends RecursiveAction {
        private float[] a;
        private int left, right;
        private static int threshold;

        SortTask(float[] a, int left, int right) {
            this.a = a;
            this.left = left;
            this.right = right;
        }

        @Override
        protected void compute() {
            if (left < right) {
                if ((right - left) < threshold) {
                    Arrays.sort(a, left, right + 1);
                } else {
                    int mid = (left + right)/2;
                    invokeAll(
                        new SortTask(a, left, mid),
                        new SortTask(a, mid + 1, right)
                    );
                    // Merge
                    int n1 = mid - left + 1;
                    int n2 = right - mid;
                    float a1[] = new float[n1];
                    float a2[] = new float[n2];
                    // Fill sub arrays
                    for (int i = 0; i < n1; ++i)
                        a1[i] = a[left + i];
                    for (int j = 0; j < n2; ++j)
                        a2[j] = a[mid + 1 + j];
                    // Sort and merge
                    int l = 0, r = 0, o = left;
                    while (l < a1.length && r < a2.length) {
                        if (a1[l] <= a2[r])
                            a[o++] = a1[l++];
                        else
                            a[o++] = a2[r++];
                    }
                    // Merge remaining
                    while (l < a1.length)
                        a[o++] = a1[l++];
                    while (r < a2.length)
                        a[o++] = a2[r++];
                }
            }
        }
    }
}

I know that the JVM can be unreliable due to JIT, but it should only affect the first few iterations no? Looking for advice on the algorithm or why my result is so far from what I am expecting.

Solution

The optimal threshold is the one that allows as many threads to run in parallel as there are cores in your system.

If your system has cores cores, the threshold should be test should be initialized with

SortTask.threshold = cores > 0 ? (a.length + cores - 1) / cores : a.length;

The speed improvement will be less than the number of cores because the last few merge phases cannot be run in parallel.

Since you are sorting an array of 10⁸ elements, the optimal threshold is indeed somewhere between 10⁷ and 10⁸, unless you have more than 10 cores.