Search code examples
javamultithreadingexecutorservicemicrobenchmark

Bench Mark in Multi threaded environment


I was learning multi threading and found slow down of Object.hashCode in multi threaded environment as it is taking over twice as long to compute the default hash code running 4 threads vs 1 thread for the same number of objects.

But as per my understanding it should take a similar amount of time doing this in parallel.

You can change the number of threads. Each thread has the same amount of work to do so you'd hope that running 4 threads on a my machine which is quad core machine might take about the same time as running a single thread.

I'm seeing ~2.3 seconds for 4x but .9 s for 1x.

Is there any gap in my understanding , please help me understanding this behaviour.

public class ObjectHashCodePerformance {

private static final int THREAD_COUNT = 4;
private static final int ITERATIONS = 20000000;

public static void main(final String[] args) throws Exception {
    long start = System.currentTimeMillis();
    new ObjectHashCodePerformance().run();
    System.err.println(System.currentTimeMillis() - start);
 }

private final ExecutorService _sevice =   Executors.newFixedThreadPool(THREAD_COUNT,
        new ThreadFactory() {
            private final ThreadFactory _delegate =   Executors.defaultThreadFactory();

            @Override
            public Thread newThread(final Runnable r) {
                Thread thread = _delegate.newThread(r);
                thread.setDaemon(true);
                return thread;
            }
        });

    private void run() throws Exception {
    Callable<Void> work = new java.util.concurrent.Callable<Void>() {
        @Override
        public Void call() throws Exception {
            for (int i = 0; i < ITERATIONS; i++) {
                Object object = new Object();
                object.hashCode();
            }
            return null;
        }
    };
    @SuppressWarnings("unchecked")
    Callable<Void>[] allWork = new Callable[THREAD_COUNT];
    Arrays.fill(allWork, work);
    List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
    for (Future<Void> future : futures) {
        future.get();
    }
 }

 }

For thread count 4 Output is

~2.3 seconds

For thread count 1 Output is

~.9 seconds

Solution

  • I've created a simple JMH benchmark to test the various cases:

    @Fork(1)
    @State(Scope.Benchmark)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @Measurement(iterations = 10)
    @Warmup(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    public class HashCodeBenchmark {
        private final Object object = new Object();
    
        @Benchmark
        @Threads(1)
        public void singleThread(Blackhole blackhole){
            blackhole.consume(object.hashCode());
        }
    
        @Benchmark
        @Threads(2)
        public void twoThreads(Blackhole blackhole){
            blackhole.consume(object.hashCode());
        }
    
        @Benchmark
        @Threads(4)
        public void fourThreads(Blackhole blackhole){
            blackhole.consume(object.hashCode());
        }
    
        @Benchmark
        @Threads(8)
        public void eightThreads(Blackhole blackhole){
            blackhole.consume(object.hashCode());
        }
    }
    

    And the results are as follows:

    Benchmark                       Mode  Cnt  Score   Error  Units
    HashCodeBenchmark.eightThreads  avgt   10  5.710 ± 0.087  ns/op
    HashCodeBenchmark.fourThreads   avgt   10  3.603 ± 0.169  ns/op
    HashCodeBenchmark.singleThread  avgt   10  3.063 ± 0.011  ns/op
    HashCodeBenchmark.twoThreads    avgt   10  3.067 ± 0.034  ns/op
    

    So we can see that as long as there are no more threads than cores, the time per hashcode remains the same.

    PS: As @Tom Cools had commented - you are measuring the allocation speed and not the hashCode() speed in your test.