I've been using hashmap for a long time and I always believe that its complexity is O(1).
I know that the key of hashmap is the hash function, which can map a key to a value. If the hash function is well designed, the collision can be kept at an acceptable level.
Today I read a hash function as below, which hashes string to hash code:
unsigned long hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
Obviously, there is a while
loop, so its complexity is O(n).
Now I'm confused. Is the complexity of hashmap always O(1)? Or the complexity depends on how we design the hash function, meaning if the hash function is not good enough, the complexity could be O(n) or even worse?
First off, a hashmap doesn't have complexity. Inserting into a hashmap does. Reading from a hashmap does. Operations have time complexity, object do not. Objects can have memory complexity, but that's not what we're talking about here.
Secondly, a hashmap doesn't always have O(1) even for reads. It has O(1) average time. The actual time can be up to O(n) for a single read, depending on how you resolve conflicts. For example, if you use a linked list conflict resolution, writes are always O(1) but reads can be up to O(n) if your hash function is bad. If you use a resize resolution, reads are always O(1), but writes can be O(n). Other solutions get other balances.
Third, this isn't a hash map. This is a hash function. It turns a complex value into a numerical one for comparison (more formally, it maps objects from a space of size N to a space of size M where N>M). That doesn't have any promise of being O(1), it's a completely separate concept from a hash map. A hash map uses a hash function to insert objects into a very large array, and thus gets O(1) time for reads and writes if the hash function is good enough that collisions are rare. The hash function itself can be any complexity, depending on the data and how it works. String hashes generally are O(n) on the string, because you want to try to make it unique (if you stop after say 4 characters, all strings with those first 4 would collide).