Search code examples
javaperformanceobjectprofilingprimitive

Adding numbers using Java Long wrapper versus primitive longs


I am running this code and getting unexpected results. I expect that the loop which adds the primitives would perform much faster, but the results do not agree.

import java.util.*;

public class Main {
    public static void main(String[] args) {
        StringBuilder output = new StringBuilder();
        long start = System.currentTimeMillis();
        long limit = 1000000000; //10^9
        long value = 0;
        for(long i = 0; i < limit; ++i){}
        long i;
        output.append("Base time\n");
        output.append(System.currentTimeMillis() - start + "ms\n");
        start = System.currentTimeMillis();
        for(long j = 0; j < limit; ++j) {
            value = value + j;
        }
        output.append("Using longs\n");
        output.append(System.currentTimeMillis() - start + "ms\n");
        start = System.currentTimeMillis();
        value = 0;
        for(long k = 0; k < limit; ++k) {
            value = value + (new Long(k));
        }
        output.append("Using Longs\n");
        output.append(System.currentTimeMillis() - start + "ms\n");
        System.out.print(output);
    }
}

Output:

Base time 359ms Using longs 1842ms Using Longs 614ms

I have tried running each individual test in it's own java program, but the results are the same. What could cause this?

Small detail: running java 1.6

Edit: I asked 2 other people to try out this code, one gets the same exact strange results that I get. The other gets results that actually make sense! I asked the guy who got normal results to give us his class binary. We run it and we STILL get the strange results. The problem is not at compile time (I think). I'm running 1.6.0_31, the guy who gets normal results is on 1.6.0_16, the guy who gets strange results like I do is on 1.7.0_04.

Edit: Get same results with a Thread.sleep(5000) at the start of program. Also get the same results with a while loop around the whole program (to see if the times would converge to normal times after java was fully started up)


Solution

  • I suspect that this is a JVM warmup effect. Specifically, the code is being JIT compiled at some point, and this is distorting the times that you are seeing.

    Put the whole lot in a loop, and ignore the times reported until they stabilize. (But note that they won't entirely stabilize. Garbage is being generated, and therefore the GC will need to kick occasionally. This is liable to distort the timings, at least a bit. The best way to deal with this is to run a huge number of iterations of the outer loop, and calculate / display the average times.)

    Another problem is that the JIT compiler on some releases of Java may be able to optimize away the stuff you are trying to test:

    • It could figure out that the creation and immediate unboxing of the Long objects could be optimized away. (Thanks Louis!)

    • It could figure out that the loops are doing "busy work" ... and optimize them away entirely. (The value of value is not used once each loop ends.)


    FWIW, it is generally recommended that you use Long.valueOf(long) rather than new Long(long) because the former can make use of a cached Long instance. However, in this case, we can predict that there will be a cache miss in all but the first few loop iterations, so the recommendation is not going to help. If anything, it is likely to make the loop in question slower.


    UPDATE

    I did some investigation of my own, and ended up with the following:

    import java.util.*;
    
    public class Main {
        public static void main(String[] args) {
            while (true) {
            test();
            }
        }
    
        private static void test() {
            long start = System.currentTimeMillis();
            long limit = 10000000; //10^9
            long value = 0;
            for(long i = 0; i < limit; ++i){}
            long t1 = System.currentTimeMillis() - start;
            start = System.currentTimeMillis();
            for(long j = 0; j < limit; ++j) {
                value = value + j;
            }
            long t2 = System.currentTimeMillis() - start;
            start = System.currentTimeMillis();
            for(long k = 0; k < limit; ++k) {
                value = value + (new Long(k));
            }
            long t3 = System.currentTimeMillis() - start;
            System.out.print(t1 + " " + t2 + " " + t3 + " " + value + "\n");
        }
    }
    

    which gave me the following output.

    28 58 2220 99999990000000
    40 58 2182 99999990000000
    36 49 157 99999990000000
    34 51 157 99999990000000
    37 49 158 99999990000000
    33 52 158 99999990000000
    33 50 159 99999990000000
    33 54 159 99999990000000
    35 52 159 99999990000000
    33 52 159 99999990000000
    31 50 157 99999990000000
    34 51 156 99999990000000
    33 50 159 99999990000000
    

    Note that the first two columns are pretty stable, but the third one shows a significant speedup on the 3rd iteration ... probably indicating that JIT compilation has occurred.

    Interestingly, before I separated out the test into a separate method, I didn't see the speedup on the 3rd iteration. The numbers all looked like the first two rows. And that seems to be saying that the JVM (that I'm using) won't JIT compile a method that is currently executing ... or something like that.

    Anyway, this demonstrates (to me) that there should be a warm up effect. If you don't see a warmup effect, your benchmark is doing something that is inhibiting JIT compilation ... and therefore isn't meaningful for real applications.