I thought it would be quicker to create directly, but in fact, adding loops takes only half the time. What happened that slowed down so much?
Here is the test code
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class Test_newArray {
private static int num = 10000;
private static int length = 10;
@Benchmark
public static int[][] newArray() {
return new int[num][length];
}
@Benchmark
public static int[][] newArray2() {
int[][] temps = new int[num][];
for (int i = 0; i < temps.length; i++) {
temps[i] = new int[length];
}
return temps;
}
}
The test results are as follows.
Benchmark Mode Cnt Score Error Units
Test_newArray.newArray avgt 25 289.254 ± 4.982 us/op
Test_newArray.newArray2 avgt 25 114.364 ± 1.446 us/op
The test environment is as follows
JMH version: 1.21
VM version: JDK 1.8.0_212, OpenJDK 64-Bit Server VM, 25.212-b04
In Java there is a separate bytecode instruction for allocating multidimensional arrays - multianewarray
.
newArray
benchmark uses multianewarray
bytecode;newArray2
invokes simple newarray
in the loop.The problem is that HotSpot JVM has no fast path* for multianewarray
bytecode. This instruction is always executed in VM runtime. Therefore, the allocation is not inlined in the compiled code.
The first benchmark has to pay performance penalty of switching between Java and VM Runtime contexts. Also, the common allocation code in the VM runtime (written in C++) is not as optimized as inlined allocation in JIT-compiled code, just because it is generic, i.e. not optimized for the particular object type or for the particular call site, it performs additional runtime checks, etc.
Here are the results of profiling both benchmarks with async-profiler. I used JDK 11.0.4, but for JDK 8 the picture looks similar.
In the first case, 99% time is spent inside OptoRuntime::multianewarray2_C
- the C++ code in the VM runtime.
In the second case, the most of the graph is green, meaning that the program runs mostly in Java context, actually executing JIT-compiled code optimized specifically for the given benchmark.
EDIT
* Actually, HotSpot JVM can inline multianewarray
, but only if the total number of allocations does not exceed -XX:MultiArrayExpandLimit
which defaults to 6.
So, for example, new int[5][10]
will be allocated inline in the JIT compiled code, while the allocation of new int[10][5]
will go through the VM runtime.