I was inspired by this topic: Performance and Memory allocation comparision between List and Set to actually run some tests and measure the performance difference between ArrayList
and HashSet
.
The most upvoted answer, in the mentioned topic, that intrigued me a lot (link), says:
HashSet consumes about 5.5 times more memory than ArrayList for the same number of elements
With the help of ScalaMeter I wanted to make sure about this.
I made two, simple tests, adding from 10000
to 100000
elements to both ArrayList
and HashSet
. Setting the initial size to the maximum did not change the results. I tested those collections with two types:
Int
(putting consecutive numbers 0 to 100000)String
(putting random String using Apache RandomStringUtils
)The code is available on my repository here.
And running those, gave me this results:
For collections holding String
of size 10:
For collections holding String
of size 50:
The question:
What happened to the theory mentioned in the quoted answer? Is it false? Or probably there is some mistake on my side?
Thanks :)!
Update after @andrzej answer I have updated the code (and the repository) once again. The results are getting better but still the results are not 5.5x different. I am checking something more now.
What happened to the theory mentioned in the quoted answer? Is it false?
We can do some calculations to get an estimate:
Let's look at the OpenJDK source for ArrayList and HashMap (since HashSet
is just a wrapper around HashMap
) for hints.
Assume you have n
elements to store.
Elements are stored in the field transient Object[] elementData;
. So length of elementData
must be at least n
.
Suppose you instantiated the list with new ArrayList<>(n)
and so elementData.length
is exactly n
.
Then the size of your list is n*c
bytes (where c
is the size of an object reference). Here I ignored the size
field and the object header of the list.
HashMap stores elements in transient Node<K,V>[] table;
where a node has fields
final int hash;
final K key;
V value;
Node<K,V> next;
Then for storing n
elements you need n
nodes or n*(3*c + 4)
bytes i.e each node has 3 object references - 3*c
bytes - and an int
- 4 bytes.
According to HashMap javadoc:
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
Based on this I'll estimate that table.length == 2*n
.
Summing up a hashmap requires n*2*c + n*(3*c + 4) = n*5*c + n*4
bytes.
Now let's suppose you have a 64bit JVM and the size of an object reference is 8 bytes (i.e c = 8
) (let's igntore things like compressed oops).
Then n*5*c + n*4 = n*5*8 + n*4 = n*44
and n*c = n*8
.
Finally n*44 / n*8 = 5.5
So the original theory that HashSet
consumes about 5.5 times more memory than ArrayList
seems quite plausible and it seems likely that something is not right with your measurements.