Search code examples
javaalgorithmhashmaptime-complexity

What is the complexity of HashMap#replace java?


I was wondering what was the complexity of the replace(Key , Value) for a HashMap is.

My initial thoughts are O(1) since it's O(1) to get the value and I can simply replace the value assigned to the key.

I'm unsure as to if I should take into account collisions that there might be in a large hashmap implemented in java with java.util.


Solution

  • tl:dr

    HashMap#replace runs in O(1) amortized;

    and under the premise that the map is properly balanced, which Java takes care of during your put and remove calls, also non-amortized.

    Non-amortized

    The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism.

    Basically, due to replace only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure.

    Hence we only pay for the cost of locating the key, which depends on the bucket size.

    The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to O(1) for replace also non-amortized.

    However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex.

    So the reality is probably somewhere in between due to the heuristics.


    Implementation

    To be sure, let us take a look at the current implementation (Java 16):

    @Override
    public V replace(K key, V value) {
        Node<K,V> e;
        if ((e = getNode(key)) != null) {
            V oldValue = e.value;
            e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
        return null;
    }
    

    The method afterNodeAccess is a dummy for subclasses and is empty in HashMap. Everything except getNode runs in O(1) trivially.

    getNode

    getNode is the canonical implementation of locating an entry in a HashMap, which we know runs in O(1) for a proper self-balancing map, like Javas implementation. Let's take a look at the code:

    /**
     * Implements Map.get and related methods.
     *
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & (hash = hash(key))]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
    

    This method basically computes the hash hash = hash(key), then looks up the hash in the table first = tab[(n - 1) & (hash = hash(key))] and starts iterating through the data structure stored in the bucket.

    Regarding the data structure for the bucket, we have a little branching going on at if (first instanceof TreeNode).

    Bucket

    The buckets are either simple implicitly linked lists or red-black-tree.

    Linked List

    For the linked list, we have a straightforward iteration

    do {
         if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    } while ((e = e.next) != null);
    

    which obviously runs in O(m) with m being the size of the linked list.

    Red-Black-Tree

    For the red-black-tree, we have

    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
    

    Lookup in a red-black-tree is O(log m), with m being the size of the tree.

    Bucket size

    Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like put or remove).

    So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant.

    Conclusion

    Having the buckets at constant size, effectively, makes getNode run in O(1), leading to replace running in O(1) as well.

    Without any self-balancing mechanism, in worst case it would degrade to O(n) if a linked list is used and O(log n) for a red-black-tree (for the case that all keys yield a hash collision).

    Feel free to dig deeper into the code but it gets a bit more complex down there.