Search code examples
data-structureshashhashtableuniversal-hashing

Is Universal family of hash functions only to prevent enemy attack?


If my intention is only to have a good hash function that spreads data evenly into all of the buckets, then I need not come up with a family of hash functions, I could just do with one good hash function, is that correct?

The purpose of having a family of hash functions is only to make it harder for the enemy to build a pathological data set as when we pick a hash function randomly, he/she has no information about which hash function is employed. Is my understanding right?

EDIT: Since someone is trying to close as unclear; This question is to know the real purpose of employing a Universal family of hash functions.


Solution

  • I could just do with one good hash function, is that correct?

    As you note later in your question, an "enemy" who knows which hash function you're using could prepare a pathological data set.

    Further, hashing is just the first stage in storing data into your table's buckets - if you're implementing open addressing / closed hashing, you also need to select alternative buckets to probe after collisions: simple approaches like linear and quadratic probing generally provide adequate collision avoidance, and are likely mathematically simpler and therefore faster than rehashing, but they don't maintain a probability of the next probe finding an unused bucket at the load factor. Rehashing with another good hash function (including another from a family of such functions) does, so if that's important to you you may prefer to use a family of hash functions.

    Note too that sometimes an in-memory hash table is used to say at which offsets/sectors on disk data is stored, so extra rehashing calculations with already-in-memory data may be far more appealing than a higher probability (with linear/quadratic probing) of waiting on disk I/O only to find another collision.