So I was curious when I got to know that dictionaries or associative arrays are usually implemented by hash tables. Upon reading about hash tables, I stumbled upon hash functions, I learned there are various hash functions such as md5, md6, sha-1 etc. What I was unable to find was which hash function is used by programming languages such as python, C++, java?
Those are.. not the same kind of 'hash function' D:
For hashtable hash functions, code must compute an appropriate hash based on object-data such that it conforms to equality requirements. It should also be "well distributed" and "fast". Most hashtable hashes are thus often 32-bit values using some form of a rolling/shifting computation. At the end of the day this hash is used to select from a much smaller pool of buckets.
Hashtable hashes are usually computed directly by (or with knowledge of) the objects be added to the hashtable - that is, generally, cryptographic hash functions are not involved in hashtables. A typical Java hashCode() function, defined on the object being added to the hashtable, for example might look like:
int hash = 7;
hash = 31 * hash + (int) int_field;
hash = 31 * hash + (str_field == null ? 0 : str_field.hashCode());
// etc.
return hash;
There are discussions on the choice of seed and multiplication values elsewhere.. but the take-way should be that most hashtable hash functions 1) directly derive from object state, applying 'tweaks' as prudent, and 2) are not designed to be "secure".
(Modern hashtable implementations often apply a "mixing function" to the generated hash value to mitigate degenerate hash function results and/or data poisoning attacks.)
On the the other hand, a cryptographic hash is designed to provide much stronger cryptographic requirements and have a much larger output space. While such a strong hash can be used for hashtables (after being derived from an object and then distilled down to a hash bucket), they are also slower to generate and usually unnecessary in context of a hash/dictionary.
Cryptographic hashes generally work on an arbitrary chunk of data or byte stream.
Hashtable hash desirable characteristics:
Cryptographic hashes have additional characteristics, beyond that of a hashtable hash:
Programming languages support a wide range of different cryptographic hash functions through their standard libraries and/or 3rd party libraries. A more well-known hash (eg. MD5/SHA-x) will generally have universal support while something more specialized (eg. MD6) may require additional effort to locate an implementation for.
On the other hand, as shown above, many hash table 'functions' are implemented directly on the object(s) involved in a hashtable, following a standard pattern, with some languages (and IDEs) providing assistance to reduce manual coding. As an example, C# provides a default reflection-based GetHashCode implementation for struct types.