Why Iterable.sum() is slow in Kotlin?

I've noticed surprising difference between performance of Itarable.sum() and direct for loop with manual sum. Consider this:

import kotlin.system.measureTimeMillis

fun main(args: Array<String>) {
    var sink = 0;
    repeat(5) {
        println(measureTimeMillis {
            var sum = 0
            for (i in 1..10_000_000) {
                sum += i
            }
            sink += sum
        })
    }
    repeat(5) {
        println(measureTimeMillis {
            sink += (1..10_000_000).sum()
        })
    }
}

Surprisingly, using Iterable.sum() is up to 10 times slower, compared to the code that is almost identical to sum() implementation. Why is that?

Update:

When I target js, then sum() is only slightly slower.

measureTimeMillis() can be defined as:

import kotlin.js.Date
public inline fun measureTimeMillis(block: () -> Unit): Double {
    val start = Date.now()
    block()
    return Date.now() - start
}

Update2:

On same linux machine, jvm sum() is even slower than js. Here are results for 100_000_000 iterations for jvm (Oracle jdk9) and js (latest chrome):

105   // jvm raw loop
76    // jvm raw loop (jit?)
75    // jvm raw loop (jit?)
75    // jvm raw loop (jit?)
70    // jvm raw loop (jit?)
633   // jvm sum()
431   // jvm sum()
562   // jvm sum()
327   // jvm sum() (jit?)
332   // jvm sum() (jit?)

110   // js raw loop
108   // js raw loop
232   // js raw loop
227   // js raw loop
227   // js raw loop
321   // js sum()
284   // js sum()
264   // js sum()
266   // js sum()
265   // js sum()

So, on same machine, jvm seems to be slower than js when using sum(). Yet another surprise.

Solution

Clearly, we're comparing super-optimized tight loops here. I'm seeing quite stable results across repetitions for the "manual sum" and wild variance in the "built-in" case. This indicates GC activity.

Upon launching VisualVM and using its VisualGC plugin, I confirmed that there's no GC activity during the manual sum computation, but a lot of it in the built-in case.

Looking at the generated bytecode, the difference becomes apparent: the idiom for (i in 1..range) { ... } compiles directly into a counted loop. This is actually documented:

Integral type ranges (IntRange, LongRange, CharRange) have an extra feature: they can be iterated over. The compiler takes care of converting this analogously to Java's indexed for-loop, without extra overhead.

Unfortunately, the same optimization doesn't apply to the extension function Iterable.sum() because it must work for any Iterable. The compiler could see what's going on and introduce another intrinsic, which would simply convert the whole thing into the resulting sum without computation, or use a direct formula if the range bounds aren't hardcoded.

JavaScript is on a similar footing here because it, too, has a powerful JIT compiler. I can't comment anything specific, but it most probably avoids allocation in the hot loop.