Search code examples
javaalgorithmdictionaryhashmap

Best HashMap initial capacity while indexing a List


I have a list (List<T> list) and I want to index its objects by their ids using a map (HashMap<Integer, T> map). I always use list.size() as the initial capacity in the HashMap constructor,like in the code below. Is this the best initial capacity to be used in this case?

Note: I'll never add more items to the map.

List<T> list = myList;
Map<Integer, T> map = new HashMap<Integer, T>(list.size());
for(T item : list) {
    map.put(item.getId(), item);
}

Solution

  • If you wish to avoid rehashing the HashMap, and you know that no other elements will be placed into the HashMap, then you must take into account the load factor as well as the initial capacity. The load factor for a HashMap defaults to 0.75.

    The calculation to determine whether rehashing is necessary occurs whenever an new entry is added, e.g. put places a new key/value. So if you specify an initial capacity of list.size(), and a load factor of 1, then it will rehash after the last put. So to prevent rehashing, use a load factor of 1 and a capacity of list.size() + 1.

    EDIT

    Looking at the HashMap source code, it will rehash if the old size meets or exceeds the threshold, so it won't rehash on the last put. So it looks like a capacity of list.size() should be fine.

    HashMap<Integer, T> map = new HashMap<Integer, T>(list.size(), 1.0);
    

    Here's the relevant piece of HashMap source code:

    void addEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }