Search code examples
javaperformancearraylistcapacity

Why is this slower when giving ArrayList an initial capacity?


For an experiment, I made this little program. It just generates 10 million random strings and adds them to an arraylist. Notice that the arraylist does not have an initial capacity.

// editors note: added the necessary boilerplate to run,
// and take initial capacity as an optional cmdline arg for easier testing
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

class ArrayListTest {
    public static void main(String[] args)
    {
        int initsize = -1;
        if (args.length > 0) {
            initsize = Integer.parseInt(args[0]);
        }

        long startTime = System.currentTimeMillis();

        List<String> randNums = initsize>=0 ? new ArrayList<>(initsize) : new ArrayList<>();
        // final List<String> randNums = initsize>=0 ? new ArrayList<String>(initsize) : new ArrayList<String>();

        Random r = new Random(1);

        for (int i = 0; i < 10_000_000; i++) {
            final int randomNum = r.nextInt();
            randNums.add(Integer.toString(randomNum));
        }

        System.out.println(System.currentTimeMillis() - startTime);
    }
}

I ran it 5 times, and the results in ms were:

8917
8720
7814
8768
8867

I then altered the program to give the ArrayList an initial capacity:

    List<String> randNums = new ArrayList<>(10_000_000);

I again ran it 5 times, and these were the results:

11562
10454
10499
10481
10415

It definitely consistently got slower. I assumed it would have been the opposite, since by declaring a big enough initial size, I took away all the overhead of the ArrayList increasing its own capacity. Why was it slower?

Further Info: Jre 1.8.051, 64-bit (On windows 10);


Solution

  • You'd think it was the other way around, but it has to do with Garbage Collection.

    I didn't see the big difference you did (3900 vs 5100), but since this is GC related, you're probably running with lower memory. I ran with 64-bit and -Xmx4g.

    Turning on the GC log (-Xloggc:path\to\file.log), I get this without size:

    Java HotSpot(TM) 64-Bit Server VM (25.51-b03) for windows-amd64 JRE (1.8.0_51-b16), built on Jun  8 2015 18:03:07 by "java_re" with MS VC++ 10.0 (VS2010)
    Memory: 4k page, physical 33478384k(25822824k free), swap 33476532k(26121788k free)
    CommandLine flags: -XX:InitialHeapSize=535654144 -XX:MaxHeapSize=4294967296 -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation -XX:+UseParallelGC 
    0.188: [GC (Allocation Failure)  131584K->114906K(502784K), 0.0795857 secs]
    0.358: [GC (Allocation Failure)  246490K->229080K(634368K), 0.0794026 secs]
    0.631: [GC (Allocation Failure)  492248K->452871K(716288K), 0.1389293 secs]
    0.770: [Full GC (Ergonomics)     452871K->451407K(1188864K), 3.3224726 secs]
    4.270: [GC (Allocation Failure)  714575K->714179K(1271808K), 0.2607092 secs]
    4.531: [Full GC (Ergonomics)     714179K->814K(1050624K), 0.0070693 secs]
    

    And I get this with the size:

    Java HotSpot(TM) 64-Bit Server VM (25.51-b03) for windows-amd64 JRE (1.8.0_51-b16), built on Jun  8 2015 18:03:07 by "java_re" with MS VC++ 10.0 (VS2010)
    Memory: 4k page, physical 33478384k(25818308k free), swap 33476532k(26123684k free)
    CommandLine flags: -XX:InitialHeapSize=535654144 -XX:MaxHeapSize=4294967296 -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation -XX:+UseParallelGC 
    0.171: [GC (Allocation Failure)  131584K->129070K(502784K), 0.0919490 secs]
    0.370: [GC (Allocation Failure)  260654K->261166K(634368K), 0.4433150 secs]
    0.813: [Full GC (Ergonomics)     261166K->260351K(899072K), 1.4135289 secs]
    2.460: [GC (Allocation Failure)  523519K->524487K(899072K), 0.7901689 secs]
    3.250: [Full GC (Ergonomics)     524487K->523517K(1421312K), 2.3177831 secs]
    

    Because the second run allocated so much more memory initially, the GC runs get slower. That apparently outweighs the extra array copying going on when the list resizes.