Search code examples
hashcryptographyhashtablehash-functioncryptographic-hash-function

Hash Table Confusion - How much space is needed for Hash Table with a good (eg. Cryptographic) Hash Function?


I am learning about Hash Tables, Hash Maps etc. I have just implemented a Hash Table in C, with operations: insert(HTable, key), delete(HTable, key), initialize(HTable) and search(HTable, key).

I would like to ask something. Since in a (proper) Hash Table the computed hashed indexes could be very large, doesn't this mean that the space consumed will be like INT_MAX (which is still O(n) of course), or more? I mean given the input element that we want to store in a hash table (ie insert it in), the insert() function would call the hash function which would then compute the hashed index for the element to go in. Thus it would use the hash function to find this index.

When we use the hash function to operate on the element, the hashed index could become very large. With a proper, for example cryptographic hash function, this index could become huge (they are using prime numbers with 300 digits - Diffie Hellman public key cryptography etc.), right? I know that in normal hash functions (such as the trivial ones beginners use to learn) we apply mod operation in order for the element to fit within the hash table's bounds, but in doing so, maybe we limit the hash function potential?

So to uniquely map an element to the hash table we must use a HUGE Hash Table. How are these cryptographic hash tables implemented? They must be completely secure, right? Even the Stack Overflow tag on "cryptographichashfunction" says that it is extremely unlikely to find two inputs that will map to the same element (as such the possibility of collisions is tiny). Wouldn't this require though a HUGE array to be stored in memory (or to disk)? Therefore, the memory consumption would be huge.

Of course, the time complexity is not a problem. We just see the start address of the hash table / array add it with the index and just go that place in memory to get the value (O(1) - search principle of Hash Table).

Am i wrong somewhere? Is there something i'm missing? I hope i made myself clear. So to conclude, i would like confirmation on this. Does a good hash function require a huge array (Hash Table) and as such a very large amount of memory to be properly implemented? Is so much space justified, or is there something i don't quite get? Thanks.


Solution

  • In general cryptographic hash values are not used for hash tables. Instead a fast hash is used. Of that hash value only as many bits may be used to tweak the size of the table. If multiple key values map to the same index then the values are stored in a separate structure, possibly with additional information to choose between the two.

    It is not required that the hash output is unique; the hash function output would be too large and the table required would certainly not fit in memory. Besides that, cryptographic hashes are generally quite slow.

    Cryptographic hash functions are usually build from operations also used in symmetric block ciphers. That means mixing and bitwise operators used in a large amount of rounds. Modular arithmetic, as used for e.g. RSA are commonly not used.

    All in all, the main thing is that the index generated doesn't need to be unique. Usually if one hash leeds to multiple values they are stored in a list or set where the key can be compared by value.