Search code examples
javaforkjoinpoolrecursivetask

Forkjoinpool VS sequential perofrmance


I am comparing sequential and parallel performance(using ForkJoinPool) of an algorithm(sum of first n numbers):

public class ForkJoinSumCalculator extends RecursiveTask<Long> {
    private static final ForkJoinPool FORKJOINPOOL = new ForkJoinPool();
    private final long[] numbers;
    private final int start;
    private final int end;
    public static final long THRESHOLD = 10_000;       

    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        int numLoops = 40;
        for(int i = 1; i <= numLoops; i++) {
            ForkJoinSumCalculator forkJoinSumCalculator = new ForkJoinSumCalculator(LongStream.rangeClosed(1, 100000000).toArray());
            FORKJOINPOOL.invoke(forkJoinSumCalculator);
        }
        System.out.println("Total time parallel:"+ (System.currentTimeMillis() - startTime));

        startTime = System.currentTimeMillis();
        for(int i = 1; i <= numLoops ; i++) {
            long seqSum = 0L;
            for(int j = 1; j <= 100000000 ; j++) {
                seqSum = seqSum + j;
            }
        }
        System.out.println("Total time sequential:"+ (System.currentTimeMillis() - startTime));
    }

    public ForkJoinSumCalculator(long[] numbers) {
        this(numbers, 0, numbers.length);
    }
    private ForkJoinSumCalculator(long[] numbers, int start, int end) {
        this.numbers = numbers;
        this.start = start;
        this.end = end;
    }
    @Override
    protected Long compute() {
        ....splitting the task
        ....or calculating the sum if size is less than THRESHOLD 
    }
}

I tried to vary numLoops for a wide range of values, but always sequential approach perform better and that too by order of 3-4.

Shouldn't parallel version perform better here given that array size is not that small.


Solution

  • Some suggestions to get the actual output:

    1. For parallel processing, you are creating a long stream of 10^8 elements in each loop iteration. In Java, creating a new object does a lot of things. So, it is affecting the performance of parallel processing. Where, in a single loop, you are not creating any object at all. So the field is not the same to compare.

    You can create an instance, and use every time the reference of that object. So change this part:

    for(int i = 1; i <= numLoops; i++) {
        ForkJoinSumCalculator forkJoinSumCalculator = new ForkJoinSumCalculator(LongStream.rangeClosed(1, 100000000).toArray());
        FORKJOINPOOL.invoke(forkJoinSumCalculator);
    }
    

    to this:

    long[] longs = LongStream.rangeClosed(1, 100000000).toArray();
    for(int i = 1; i <= numLoops; i++) {
        ForkJoinSumCalculator forkJoinSumCalculator = new ForkJoinSumCalculator(longs);
        FORKJOINPOOL.invoke(forkJoinSumCalculator);
    }
    

    2. While you are doing summation in sequential one, you used the primitive variable to declare seqSum, whereas, your parallel task is using the boxed one to return. And boxed calculation takes more time than the primitive one.

    Also, as you are sending an array in the parallel task, I am guessing(you haven't given that code in the post), you are using that array to get the sum. But in the sequential one, you don't need to access an index of any reference. Rather, you are getting the value from the iteration variable. For numbers like 10^8, it is actually a lot to create a difference.

    So change that part from:

    for(int i = 1; i <= numLoops ; i++) {
        long seqSum = 0L;
        for(int j = 1; j <= 100000000 ; j++) {
            seqSum = seqSum + j;
        }
    }
    

    to:

    for(int i = 1; i <= numLoops ; i++) {
        Long seqSum = 0L;
        for(int j = 1; j < 100000000 ; j++) {
            seqSum = seqSum + longs[j];
        }
    }
    

    After all these changes, with a 4 processor machine, I'm getting this:

    Total time parallel:4461
    Total time sequential:25542
    

    And with an 8 processors machine(updated one and much better configured machine):

    Total time parallel:3157
    Total time sequential:16863
    

    Last but not least, your question gave me some points to think about. That's why, Thanks!

    Happy Coding!