Is a hash table really O(1)?

The way I understand it, an hash table is an array of linked lists, so it should actually be O(n/array_length).
Isn't saying the it's O(1) just plain wrong?

For example, if I have 1M items on hash table that is based on a 100-sized array, the average lookup would take 5,000 items. Clearly not O(1), although I assume that most implementations of a hash table use a much bigger sized array.

And what is usually the array size that is being used in most languages' (JS, Go, etc) hash table implementations?

Solution

You are correct. In general, it is not possible to say that all hash tables are O(1). It depends on some design decisions, and on other factors.

In your example, you seem to be talking about a hash table with a fixed number of buckets (100) and an unbounded number of entries N. With that design, it will take on average N / 50 comparisons to find a key that is present, and on average N / 100 comparison to discover that a key is not present. That is O(N).

There is a common implementation strategy to deal with this. As the hash table gets larger, you periodically resize the primary array, and then redistribute the keys / entries. For example, the standard Java HashMap and HashTable classes track the ratio of the array size and the number of entries. When the ratio exceeds a configurable load factor, the primary array size ts doubled. (See the javadoc for an explanation of the load factor.)

The analysis of this is complicated. However, if we can assume that keys are roughly evenly distributed across the buckets, we get the following:

average lookup times that are O(1)
average insertion times that are O(1)
worst-case insertion times that are O(N) ... when the insertion triggers a resize.

What if the key distribution is badly skewed?

This can happen if the hash function is a poor one, or if the process that generates the keys is pathological.

Well, if you don't do anything about it, the worst case occurs when all keys have the same hash value and end up in the same bucket ... irrespective of the primary array size. This results in O(N) lookup and insertion.

But there are a couple of ways to mitigate this. A simple way is to perform a second hashing operation on the key's hashcodes. This helps in some cases. A more complex way is to replace the hash chains with balanced binary trees. This changes the average behavior of lookup and insertion (for the case of pathological keys) from O(N) to O(logN)`.

From Java 8 onwards, the HashMap implementation uses either hash chains or trees, depending on the number of keys in a given bucket.

And what is usually the array size that is being used in most languages' (JS, Go, etc) hash table implementations?

For Java (and probably for the others) the array size changes over the life-time of the hash table as described above. In Java, there is an upper limit on the size or the array. Java arrays can only have 2³¹ - 1 elements.