Search code examples
cstring-searchrabin-karp

Rabin-Karp string search algorithm


My previous question pertained to the general string search algorithm. I am researching the Rabin-Karp algorithm and I have a function template like:

RabinKarpMatch(char *Text, char *Search_phrase,int radix,int prime)

I wanted to know how the values of the radix and prime will change according to the search_phrase and text? Or should I just give them arbitrary values for all the cases?


Solution

  • In Rabin-Karp algorithm radix and prime don't change during text processing. But choosing good radix and prime numbers has a critical importance. In worst case (almost impossible in practice) when all substrings of the text have the same hash code equal to template hash code, algorithm will work on O(nm) time, where n is text length and m is template length.

    General rule: Prime - must be small, and radix - must be convenient to use. I believe pairs like:

    (prime, radix)

    31, 2^64

    37, 2^64

    57, 2^64

    will be OK for you.

    In some implementations to minimize hash collisions more than one pair is used.