Search code examples
javadata-structuresjava-8hashmap

The worst case time complexity for HashMap after Java 8 is still O(n) not O(log n)?


I encountered an interview question today related to Java's HashMaps and I'm seeking clarification on a few points regarding the time complexity of certain operations and key comparisons in Java 8 and later versions.

Given the following code snippet, what is the time complexity of the map.get(new Key(1)); line, considering Java 8's improvements to HashMap?

import java.util.*;

class Key {
    int v;

    public Key(int v) {
        this.v = v;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Key key = (Key) o;
        return v == key.v;
    }

    @Override
    public int hashCode() {
        return 0; // always hash collision
    }
}

class Solution {
    public static void main(String[] args) {
        int n = 1_000_000;
        Map<Key, Integer> map = new HashMap<>();
        for (int i = 0; i < n; i++) {
            Key key = new Key(i);
            map.put(key, i);
        }
        map.get(new Key(1)); // What's the time complexity of this line?
    }
}

I initially answered O(log n), mentioning that Java 8 automatically converts lengthy linked lists in buckets into Red-Black Trees for efficiency.

Q1: However, the interviewer then inquired about how comparison is done within the RB Tree when the Key class doesn't implement the Comparable interface.

I answered that hashCode is first used for comparison, since even in the same bucket it just means that h1 % length == h2 % length. Upon further questioning about the scenario where hash codes are identical, I mentioned the use of System.identityHashCode. My answer is same as https://stackoverflow.com/a/43911638/10969942

Q2: This leads to a discussion about the case where two keys are equals but likely have different System.identityHashCode values:

Key k1 = new Key(1); 
Key k2 = new Key(1);

I speculated that it might require traversing all entries in the subtree with the same hash code. Then the interviewer asked whether it means that the worst case time complexity is O(n)? I don't know since I always heard that after Java8, HashMap can have guaranteed worst case O(log n) time complexity.

Q3: The interviewer then questioned how two different keys with the same System.identityHashCode can be inserted into the map given that an RB Tree does not allow duplicate keys:

Key k1 = new Key(1); // same identityHashCode
Key k2 = new Key(2); // same identityHashCode

I remember it being possible for different objects to have the same System.identityHashCode https://stackoverflow.com/a/1063100/10969942, but I'm unclear on how the HashMap manages these insertions without violating the RB Tree's constraints on no duplicate keys, that is handle both hashCode and identityHashCode collision.


Solution

  • Given the following code snippet, what is the time complexity of the map.get(new Key(1)); line, considering Java 8's improvements to HashMap?

    It is O(n).

    I initially answered O(log n), mentioning that Java 8 automatically converts lengthy linked lists in buckets into Red-Black Trees for efficiency.

    Yes. In this case, every key has the same hash code, so there will be only one hash bucket. With enough entries it will be organized as a balanced tree.

    Q1: However, the interviewer then inquired about how comparison is done within the RB Tree when the Key class doesn't implement the Comparable interface.

    As you said, the tree order falls back to identity hashcode when hash codes are equal and the key type does not have a natural order.

    Q2: This leads to a discussion about the case where two keys are equals but likely have different System.identityHashCode values:

    This is indeed a problem for HashMap. If a key to be looked up is the same object as one already in the map, then it will be found in O(log n) steps, but if it isn't then the map still has to be searched for one equals() to it. That will take O(n) steps if such a key is present, and Θ(n) steps if not.

    Then the interviewer asked whether it means that the worst case time complexity is O(n)?

    Yes, it does.

    I don't know since I always heard that after Java8, HashMap can have guaranteed worst case O(log n) time complexity.

    I hope you didn't say that. A good interviewer doesn't expect you to know everything, and people can acquire misinformation in a wide variety of ways. But having led you to the conclusion, they could reasonably expect you to accept it. At minimum to acknowledge that the conclusion seems sound.

    Q3: The interviewer then questioned how two different keys with the same System.identityHashCode can be inserted into the map given that an RB Tree does not allow duplicate keys:

    I guess the supposition here is

    it being possible for different objects to have the same System.identityHashCode

    It is true that Java does not guarantee distinct identity hash codes, but it does specify:

    As far as is reasonably practical, the hashCode method defined by class Object returns distinct integers for distinct objects.

    (Object's implementation of hashCode() is, of course, the identity hash code.)

    I'm unclear on how the HashMap manages these insertions without violating the RB Tree's constraints on no duplicate keys, that is handle both hashCode and identityHashCode collision.

    I don't know off the top of my head, and a quick doc check did not turn up an answer. But assuming that HashMap does handle that case, I expect it does so with a linked list in each tree node, or similar. This does not present the same issue here that using a linked list did at the bucket level, because the standard library implementation can be relied upon to provide reasonably good identity hash codes (see above).

    Even if that's not what HashMap actually does, I suspect that the interviewer would have been satisfied with it. Again, they don't expect you to know everything, but they likely are looking for people who are prepared to think.