java multithreading callable java-threads

Java Callable : Time taken more than a single thread process

I am having the following example code.

import java.util.*;
import java.lang.*;
import java.io.*;

import java.util.concurrent.*;
public class CalculationThread implements Callable<Long> {

    public Long call() throws Exception {
        long k=0L;
        for(int i=0;i<100000;i++){
            for(int j=0;j<50;j++){
                k=i+j;
            }
        }
        return k;
    }
    public static void main(String[] args) throws InterruptedException {
        ExecutorService executorService = Executors.newFixedThreadPool(4);
        long startTime = System.nanoTime();
        for(int lo=0;lo<5000;lo++){
            Future<Long> result = executorService.submit(new CalculationThread());

            try {
                Long l = result.get();
            } catch (Exception e) {
                result.cancel(true);
            }

        }
        long endTime = System.nanoTime();
        System.out.println("Using threads took "+(endTime - startTime) + " ns");

        executorService.shutdown();
        executorService.awaitTermination(1, TimeUnit.SECONDS);

        long k=0L;
        startTime = System.nanoTime();
        for(int lo=0;lo<5000;lo++){
            for(int i=0;i<100000;i++){
                for(int j=0;j<50;j++){
                    k=i+j;
                }
            }

        }
        endTime = System.nanoTime();
        System.out.println("Generally it takes "+(endTime - startTime) + " ns");

    }
}

The output is as spread out as

Using threads took 101960490 ns
Generally it takes 143107865 ns

Using threads took 245339720 ns
Generally it takes 149699885 ns

As one can notice the second row is almost constant while the threaded version varies a lot. Why is such a case? What can be done to reduce the variability? Please let me know if I am doing something foolish as I am new to Java multi-threading.

Solution

Future#get blocks until your callable finishes. So the main thread submits a Callable to the pool, then sits around waiting for it to finish before submitting the next one. You have the overhead of creating the four threads of the pool, then you have context switches between the threads and object creation where you create callables (with garbage collection to be done as the callables get discarded), then you're not doing any of the work concurrently.

How you could ever get numbers where the version using the pool is faster is puzzling. When I run this locally (and good job making an MVCE, btw, I can copy-and-paste with no changes and it works) I get numbers consistently higher for the threadpooled part, it takes about 3 times as long as the single-threaded code.