Search code examples
javaperformancejitjvm-hotspot

Performance of Collections.emptyList and empty ArrayList with JIT compiler


Is there a performance difference between using Collections.emptyList() or an empty ArrayList, especially when using a JIT compiler?

I could imagine that - for example - the JIT compiler doesn't do inlining or static method calls because the executed method depends on the type.

Edit I know that Collections.emptyList() returns an immutable list while ArrayList is mutable object.

What I mean is that if I pass one or the other to a method as a parameter and the method doesn't modify the list does that restrict the possibilities for the JIT compiler to optimize the method?

A simple example (just to clarify what I mean):

int sum(List<Integer> list)
{
    int sum = 0;

    for(int i=0;i<list.size();++i)
      sum += list.get(i);

    return sum;
}

If I only called this method with ArrayList the JIT compiler could inline ArrayList.get(). If I also made calls with Collections.empty() it wouldn't be possible.

Is that correct?


Solution

  • Disclamer

    All that is written below applies only to HotSpot JVM.

    Short Answers

    the JIT compiler doesn't do inlining or static method calls because the executed method depends on the type.

    This is opposite of true. See my answer.

    Is there a performance difference between using Collections.emptyList() or an empty ArrayList, especially when using a JIT compiler?

    In rare cases - yes. See microbenchmark results.

    If I only called this method with ArrayList the JIT compiler could inline ArrayList.get(). If I also made calls with Collections.empty() it wouldn't be possible. Is that correct?

    The short answer - it depends. JIT compiler is smart enough to recognize monomorphic, bimorphic and polimorphic call patterns and provide appropriate implementations.

    Answer

    In order to get a detailed answer I would recommend to read the following post about black magic of method dispatching. In few words

    C2 does an interesting profile-guided optimization based on the observed type profile. If there is only a single receiver type (that is, the call site is monomorphic), it can simply check for the predicted type, and inline the target directly. The same optimization can and will be applied if there are two receiver types observed (that is, the call site is bimorphic), at the cost of two branches.

    Let's consider the following JMH example(if you haven’t learned about JMH then I suggest to read about it here).

    @State(Scope.Benchmark)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @Fork(value = 5)
    public class ExampleBench {
    
        @Param("10000")
        private int count;
    
        List<Integer>[] arrays;
        List<Integer>[] empty;
        List<Integer>[] bimorphic;
        List<Integer>[] polimorphic;
    
        @Setup
        public void setup(){
            Random r = new Random(0xBAD_BEEF);
            arrays = new List[count];
            empty = new List[count];
            bimorphic = new List[count];
            polimorphic = new List[count];
            for (int i = 0; i < arrays.length; i++) {
                bimorphic[i] = r.nextBoolean() ? new ArrayList<Integer>(0) : Collections.<Integer>emptyList();
                int i1 = r.nextInt(3);
                switch (i1) {
                    case 0 : polimorphic[i] = new ArrayList<>(0);
                        break;
                    case 1 : polimorphic[i] = new LinkedList<>();
                        break;
                    case 2 : polimorphic[i] = Collections.emptyList();
                        break;
                }
                arrays[i] = new ArrayList<>(0);
                empty[i] = Collections.emptyList();
            }
        }
    
        @Benchmark
        public float arrayList() {
            List<Integer>[] l = arrays;
            int c = count;
            float result = 0;
            for (int i = 0; i < c; i++) {
                result += sum(l[i]);
            }
            return result;
        }
    
        @Benchmark
        public float emptyList() {
            List<Integer>[] l = empty;
            int c = count;
            float result = 0;
            for (int i = 0; i < c; i++) {
                result += sum(l[i]);
            }
            return result;
        }
    
        @Benchmark
        public float biList() {
            List<Integer>[] l = bimorphic;
            int c = count;
            float result = 0;
            for (int i = 0; i < c; i++) {
                result += sum(l[i]);
            }
            return result;
        }
    
        @Benchmark
        public float polyList() {
            List<Integer>[] l = polimorphic;
            int c = count;
            float result = 0;
            for (int i = 0; i < c; i++) {
                result += sum(l[i]);
            }
            return result;
        }
    
        int sum(List<Integer> list) {
            int sum = 0;
            for (int i = 0; i < list.size(); ++i) {
                sum += list.get(i);
            }
            return sum;
        }
    }
    

    Results are:

    Benchmark               (count)  Mode  Cnt       Score       Error  Units
    ExampleBench.arrayList    10000  avgt    5   22902.547 ± 27665.651  ns/op
    ExampleBench.biList       10000  avgt    5   50459.552 ±   739.379  ns/op
    ExampleBench.emptyList    10000  avgt    5    3745.469 ±   211.794  ns/op
    ExampleBench.polyList     10000  avgt    5  164879.943 ±  5830.008  ns/op
    

    In case of monomorphic and bimorphic calls JIT replaces virtual call by concrete implementations. For example in case of arrayList() we have the following output for -XX:+PrintInlining:

    @ 27   edu.jvm.runtime.ExampleBench::sum (38 bytes)   inline (hot)
       @ 6   java.util.ArrayList::size (5 bytes)   accessor
        \-> TypeProfile (15648/15648 counts) = java/util/ArrayList
    

    for emptyList():

    @ 27   edu.jvm.runtime.ExampleBench::sum (38 bytes)   inline (hot)
       @ 6   java.util.Collections$EmptyList::size (2 bytes)   inline (hot)
        \-> TypeProfile (9913/9913 counts) = java/util/Collections$EmptyList
    

    for biList():

    @ 27   edu.jvm.runtime.ExampleBench::sum (38 bytes)   inline (hot)
       @ 6   java.util.Collections$EmptyList::size (2 bytes)   inline (hot)
       @ 6   java.util.ArrayList::size (5 bytes)   accessor
        \-> TypeProfile (2513/5120 counts) = java/util/ArrayList
        \-> TypeProfile (2607/5120 counts) = java/util/Collections$EmptyList
    

    In case of polyList() JIT does not inline any implementation and uses true virtual call.

    What is the advantages of using inline functions in these methods? Let's look at compiler-generated code for arrayList():

    0x00007ff9e51bce50: cmp $0xf80036dc,%r10d     ;instance of 'java/util/ArrayList'
    0x00007ff9e51bce57: jne L0000                 ;if false go to L0000 (invokeinterface size)
    0x00007ff9e51bce59: mov 0x10(%rdx),%ebp       ;*getfield size optimization java.util.ArrayList::size@1 
    
    .....
    
    0x00007ff9e51bce6d: retq
                 L0000: mov $0xffffffde,%esi      ; true virtual call starts here
    0x00007ff9e51bce73: mov %rdx,(%rsp)
    0x00007ff9e51bce77: callq 0x00007ff9e50051a0  ; OopMap{[0]=Oop off=92}
                                                  ;*invokeinterface size
                                                  ; - edu.jvm.runtime.ExampleBench::sum@6 (line 119)
                                                  ;   {runtime_call}
    

    As you can see JIT replaces virtual call by getfield.