Search code examples
javaperformancejvmmicrobenchmarkjmh

Performance difference in arithmetic operations between static and non-static field


I have a class which counts events. It looks like this:

public class Counter {
    private static final long BUCKET_SIZE_NS = Duration.ofMillis(100).toNanos();
    ...

    private long nextBucketNum() {
        return clock.getTime() / BUCKET_SIZE_NS;
    }

    public void count() {
       ...
       final long num = nextBucketNum();
       ...
    }
    ...
}

If I remove static modifier from the field (intending to make it a class parameter), the counting throughput degrades more than for 25% according to JMH report.

The generated bytecode for static case:

 INVOKEINTERFACE Clock.getTime ()J (itf)
 GETSTATIC Counter.BUCKET_SIZE_NS : J
 LDIV

And for non-static one:

INVOKEINTERFACE Clock.getTime ()J (itf)
ALOAD 0
GETFIELD Counter.BUCKET_SIZE_NS : J
LDIV

Am I doing performance test wrong experiencing some sort of dead-code elimination or is it some legitimate micro-optimization at some level like JIT or Hyperthreading?

The difference exists both in single-theaded and multi-threaded benchmarks.

Environment:

JMH version: 1.34
VM version: JDK 1.8.0_161, Java HotSpot(TM) 64-Bit Server VM, 25.161-b12

macOS Monterey 12.2.1

Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz

Solution

  • There are 2 optimizations at play here:

    • Constant folding: The static final field is pre-computed and written into the code blob (the end result of JIT compilation). This is will translate into a performance win compared to a memory load (when reading the field).
    • Arithmetic simplification: When dividing by a potentially variable quantity, the compiler has to use a division instruction which is super expensive. When dividing by a constant, the compiler can come up with a cheaper alternative. This is particularly true when dividing (and multiplying) by powers of 2 which can be simplified into shift instructions.

    To look further into this I would recommend you run your benchmark with perfasm and see where the cycles went and what assembly code was generated.

    Happy hunting!