Kotlin measureTime differs from kotlinx-benchmark (jmh) by far

I'm testing with the following class (You can find a git repository here):

@ExperimentalStdlibApi
@State(Scope.Benchmark)
class TestBenchmark {

    private fun benchmark() : List<Int> {
        return buildList {
            addAll(0..100)
            shuffle()
            sortDescending()
        }
    }

    final fun measureTime() {
        val result: Any?
        val time = measureNanoTime {
            result = benchmark()
        }
        println("$time ns")
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    final fun benchmarkFunction() {
        benchmark()
    }
}

@ExperimentalStdlibApi
fun main() {
    TestBenchmark().measureTime()
}

With kotlins measureTimeNanos I get 65238400 ns on my machine. However when performing a benchmark with kotlinx-benchmark via gradlew benchmark I get:

  Success: N = 611869
  mean =  12388,465 ±(99.9%) 39,959 ns/op

How is that possible?

Solution

You simply can't use measureNanoTime for microbenchmarks like this: that result is completely unreliable. A big reason for that are optimizations made at runtime by the JVM, as well as non-deterministic behavior like GC or system effects like power management, OS scheduling, and time sharing.

The entire point of JMH is to create a harness which attempts to work around all of these issues, to produce more reliable micro-benchmarking results.

See the article Avoiding Benchmarking Pitfalls on the JVM for a deeper discussion of the issues.

Aleksey Shipilëv, one of the maintainers of JMH, has many fascinating articles and talks about the subject. See:

https://shipilev.net/#benchmarking ("Two Timestamps" Story)
https://shipilev.net/#benchmarking-1 ("The Lesser of Two Evils" Story)
http://shipilev.net/blog/2014/nanotrusting-nanotime/ ("Nanotrusting the Nanotime"
The javadocs of the JMH samples are fascinating