I was doing some performance testing regarding object allocation, when I came across a weird result. I have the following java code:
public static long TestMethod(){
int len = 10000000;
Object[] obs = new Object[len];
long t = System.nanoTime();
for (int i = 0; i < len; i++) {
obs[i] = new Object();
}
return System.nanoTime() - t;
}
public static void main(String... args) throws InterruptedException {
for(int i = 0; i < 10; i++){
System.gc();
System.gc();//Wait for the gc to be finished
Thread.sleep(1000);
System.out.println(TestMethod());
}
}
Expectation: First call will be slower then second call, due to requesting a larger memory space from the operating system and hotspot enhancements. But second and third will be nearly the same.
Observed result:
11284734000
799837000
682304000
304736000
380770000
392786000
374279000
381611000
379174000
407256000
Still a considerable speed up between the third and fourth iteration. What causes this speedup? How can I be sure my measurements are accurate when testing other functions, do I have to call the function more than four times before measuring?
What causes this speedup?
It is most likely JIT compilation, but it could also be contributions from code loading and/or heap warm up effects.
How can I be sure my measurements are accurate when testing other functions, do I have to call the function more than four times before measuring?
You need to do something like that. There is no other way to eliminate JVM warmup effects from your measurements, and still get representative results. Writing valid "micro-benchmarks" for Java is difficult, and you need to read up on all of the issues before you try. Start with this: How do I write a correct micro-benchmark in Java?
I would also note a couple of other things:
Your attempts to remove the cost of garbage collection from your measurements (I assume that's what you seem to be going) appear to have failed. It looks like you are getting minor collections during the execution of testMethod
. That would account for the ~7% variability in your "steady state" results.
Separating the cost of allocating an object from the cost of freeing it is likely to give you misleading results. The "total" cost of allocating an object includes the cost of zeroing the memory when it is recycled ... and that is done by the garbage collector.
In fact, the most useful measure is the amortized cost per-object of an allocate / collect cycle. And that (surprisingly) depends on the amount of non-garbage when the GC runs ... which is something that your benchmark doesn't take into account.