Search code examples
javascalacollectionsperformance-testingscalameter

ArrayList and HashSet memory allocation strange test results


I was inspired by this topic: Performance and Memory allocation comparision between List and Set to actually run some tests and measure the performance difference between ArrayList and HashSet.

The most upvoted answer, in the mentioned topic, that intrigued me a lot (link), says:

HashSet consumes about 5.5 times more memory than ArrayList for the same number of elements

With the help of ScalaMeter I wanted to make sure about this.

I made two, simple tests, adding from 10000 to 100000 elements to both ArrayList and HashSet. Setting the initial size to the maximum did not change the results. I tested those collections with two types:

  • Int (putting consecutive numbers 0 to 100000)
  • String (putting random String using Apache RandomStringUtils)

The code is available on my repository here.

And running those, gave me this results:

  • X-axis - size -> size of the collection
  • Y-axis - value -> amount of kB used

For collections holding Int: Integer results

For collections holding String of size 10: String results size 10

For collections holding String of size 50: String results size 50

The question:

What happened to the theory mentioned in the quoted answer? Is it false? Or probably there is some mistake on my side?

Thanks :)!

Update after @andrzej answer I have updated the code (and the repository) once again. The results are getting better but still the results are not 5.5x different. I am checking something more now.


Solution

  • What happened to the theory mentioned in the quoted answer? Is it false?

    We can do some calculations to get an estimate:

    Let's look at the OpenJDK source for ArrayList and HashMap (since HashSet is just a wrapper around HashMap) for hints.

    Assume you have n elements to store.

    ArrayList

    Elements are stored in the field transient Object[] elementData;. So length of elementData must be at least n.
    Suppose you instantiated the list with new ArrayList<>(n) and so elementData.length is exactly n. Then the size of your list is n*c bytes (where c is the size of an object reference). Here I ignored the size field and the object header of the list.

    HashMap

    HashMap stores elements in transient Node<K,V>[] table; where a node has fields

    final int hash;
    final K key;
    V value;
    Node<K,V> next;
    

    Then for storing n elements you need n nodes or n*(3*c + 4) bytes i.e each node has 3 object references - 3*c bytes - and an int - 4 bytes.
    According to HashMap javadoc:

    When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

    Based on this I'll estimate that table.length == 2*n.
    Summing up a hashmap requires n*2*c + n*(3*c + 4) = n*5*c + n*4 bytes.

    Summary

    Now let's suppose you have a 64bit JVM and the size of an object reference is 8 bytes (i.e c = 8) (let's igntore things like compressed oops). Then n*5*c + n*4 = n*5*8 + n*4 = n*44 and n*c = n*8.
    Finally n*44 / n*8 = 5.5

    So the original theory that HashSet consumes about 5.5 times more memory than ArrayList seems quite plausible and it seems likely that something is not right with your measurements.