Search code examples
javacpu-usageenvironmentjmh

JMH: strange dependency on the environment


While making my first approaches to using JMH to benchmark my class, I encountered a behavior that confuses me, and I'd like to clarify the issue before moving on.

The situation that confuses me:
When I run the benchmarks while the CPU is loaded (78%-80%) by extraneous processes, the results shown by JMH look quite plausible and stable:

Benchmark                                  Mode  Cnt    Score   Error  Units
ArrayOperations.a_bigDecimalAddition       avgt    5  264,703 ± 2,800  ns/op
ArrayOperations.b_quadrupleAddition        avgt    5   44,290 ± 0,769  ns/op
ArrayOperations.c_bigDecimalSubtraction    avgt    5  286,266 ± 2,454  ns/op
ArrayOperations.d_quadrupleSubtraction     avgt    5   46,966 ± 0,629  ns/op
ArrayOperations.e_bigDecimalMultiplcation  avgt    5  546,535 ± 4,988  ns/op
ArrayOperations.f_quadrupleMultiplcation   avgt    5   85,056 ± 1,820  ns/op
ArrayOperations.g_bigDecimalDivision       avgt    5  612,814 ± 5,943  ns/op
ArrayOperations.h_quadrupleDivision        avgt    5  631,127 ± 4,172  ns/op

Relatively large errors are because I need only a rough estimate right now and I trade precision for quickness deliberately.

But the results obtained without extraneous load on the processor seem amazing to me:

Benchmark                                  Mode  Cnt    Score     Error  Units
ArrayOperations.a_bigDecimalAddition       avgt    5  684,035 ± 370,722  ns/op
ArrayOperations.b_quadrupleAddition        avgt    5   83,743 ±  25,762  ns/op
ArrayOperations.c_bigDecimalSubtraction    avgt    5  531,430 ± 184,980  ns/op
ArrayOperations.d_quadrupleSubtraction     avgt    5   85,937 ± 103,351  ns/op
ArrayOperations.e_bigDecimalMultiplcation  avgt    5  641,953 ± 288,545  ns/op
ArrayOperations.f_quadrupleMultiplcation   avgt    5  102,692 ±  31,625  ns/op
ArrayOperations.g_bigDecimalDivision       avgt    5  733,727 ± 161,827  ns/op
ArrayOperations.h_quadrupleDivision        avgt    5  820,388 ± 546,990  ns/op

Everything seems to work almost twice slower, iteration times are very unstable (may vary from 500 to 1300 ns/op at neighbor iterations) and the errors are respectively unacceptably large.

The first set of results is obtained with a bunch of application running, including Folding@home distribute computations client (FahCore_a7.exe) which takes 75% of CPU time, a BitTorrent client that actively uses disks, a dozen of tabs in a browser, e-mail client etc. Average CPU load is about 85%. During the benchmark execution FahCoredecreases the load so that Java takes 25% and total load is 100%.

The second set of results is taken when all unnecessary processes are stopped, CPU is practically idle, only Java takes it's 25% and a couple of percents are used for system needs.

My CPU is Intel i5-4460, 4 kernels, 3.2 GHz, RAM 32 GB, OS Windows Server 2008 R2.
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode)

The questions are:

  1. Why the benchmarks show much worse and unstable results when it's the only task that loads the machine?
  2. Can I consider the first set of results more or less reliable when they depend on the environment so dramatically?
  3. Should I setup the environment somehow to eliminate this dependency?
  4. Or is this my code that is to blame?

The code:

package com.mvohm.quadruple.benchmarks;

// Required imports here

import com.mvohm.quadruple.Quadruple; // The class under tests

@State(value = Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(java.util.concurrent.TimeUnit.NANOSECONDS)
@Fork(value = 1)
@Warmup(iterations = 3, time = 7)
@Measurement(iterations = 5, time = 10)
public class ArrayOperations {

  // To do BigDecimal arithmetic with the precision close to this of Quadruple
  private static final MathContext MC_38 = new MathContext(38, RoundingMode.HALF_EVEN);

  private static final int DATA_SIZE = 0x1_0000;        // 65536
  private static final int INDEX_MASK = DATA_SIZE - 1;  // 0xFFFF

  private static final double RAND_SCALE = 1e39; // To provide a sensible range of operands,
                                                 // so that the actual calculations don't get bypassed

  private final BigDecimal[]      // Data to apply operations to
      bdOp1     = new BigDecimal[DATA_SIZE],  // BigDecimals 
      bdOp2     = new BigDecimal[DATA_SIZE],
      bdResult  = new BigDecimal[DATA_SIZE];
  private final Quadruple[]
      qOp1      = new Quadruple[DATA_SIZE],   // Quadruples
      qOp2      = new Quadruple[DATA_SIZE],
      qResult   = new Quadruple[DATA_SIZE];

  private int index = 0;

  @Setup
  public void initData() {
    final Random rand = new Random(12345); // for reproducibility
    for (int i = 0; i < DATA_SIZE; i++) {
      bdOp1[i] = randomBigDecimal(rand);
      bdOp2[i] = randomBigDecimal(rand);
      qOp1[i] = randomQuadruple(rand);
      qOp2[i] = randomQuadruple(rand);
    }
  }

  private static Quadruple randomQuadruple(Random rand) {
    return Quadruple.nextNormalRandom(rand).multiply(RAND_SCALE); // ranged 0 .. 9.99e38
  }

  private static BigDecimal randomBigDecimal(Random rand) {
    return Quadruple.nextNormalRandom(rand).multiply(RAND_SCALE).bigDecimalValue();
  }

  @Benchmark
  public void a_bigDecimalAddition() {
    bdResult[index] = bdOp1[index].add(bdOp2[index], MC_38);
    index = ++index & INDEX_MASK;
  }

  @Benchmark
  public void b_quadrupleAddition() {
    // semantically the same as above 
    qResult[index] = Quadruple.add(qOp1[index], qOp2[index]); 
    index = ++index & INDEX_MASK;
  }

  // Other methods are similar 

  public static void main(String... args) throws IOException, RunnerException {
    final Options opt = new OptionsBuilder()
        .include(ArrayOperations.class.getSimpleName())
        .forks(1)
        .build();
    new Runner(opt).run();
  }

}

Solution

  • The reason was very simple, and I should have understood it immediately. Power saving mode was enabled in the OS, which reduced the clock frequency of the CPU under low load. The moral is, always disable power saving when benchmarking!