Search code examples
algorithmhashtable

What are the benefits of a cumulative component sum hashcode function over a regular summation of the ASCII values?


In the case of regular hash tables encoding text. Is it that you just get less collisions because the range of numbers is larger?

Edit: Cumulative component sum is the function which returns the factorial of the string ASCII values. ie s="string" -> s[0] + (s[0]+s[1])+ (s[0]+s[1]+s[2]) ... till len(s).

Regular sum is just s[0]+s[1]+s[2]...


Solution

  • Often a several English words use exactly the same letters, but in a different order. (Those words are anagrams of each other). (For example, angel / angle / glean ).

    Because order doesn't matter in simple addition, all of the anagrams of a word have the same sum. So using simple sums as your hash function always leads to a collision when two different keys are anagrams of each other.

    I've never heard the term "Cumulative component sum hashcode", but from your description it is the same as the second part of Fletcher's checksum.

    Using a hash function that gives different results for the same letters in a different order, such as the second part of Fletcher's checksum (or the entire Fletcher's checksum), leads to fewer collisions in a hash table.