I am interested in implementing the Rabin-Karp algorithm to search for sub strings as stated on wiki: http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm. Not for homework, but for self-interest. I have implemented the Rabin-Karp algorithm (shown below) and it works. However, the performance is really, really bad!!! I understand that my hash function is basic. However, it seems that a simple call to strstr() will always outperform my function rabin_karp(). I can understand why - the hash function is doing more work than a simple char-by-char compare each loop. What am I missing here? Should the Rabin-Karp algorithm be faster than a call to strstr()? When is the Rabin-Karp algorithm best used? Hence my self-interest. Have I even implemented the algorithm right?
size_t hash(char* str, size_t i)
{
size_t h = 0;
size_t magic_exp = 1;
// if (str != NULL)
{
while (i-- != 0)
{
magic_exp *= 101;
h += magic_exp + *str;
++str;
}
}
return h;
}
char* rabin_karp(char* s, char* find)
{
char* p = NULL;
if (s != NULL && find != NULL)
{
size_t n = strlen(s);
size_t m = strlen(find);
if (n > m)
{
size_t hfind = hash(find, m);
char* end = s + (n - m + 1);
for (char* i = s; i < end; ++i)
{
size_t hs = hash(i, m);
if (hs == hfind)
{
if (strncmp(i, find, m) == 0)
{
p = i;
break;
}
}
}
}
}
return p;
}
You haven't implemented the hash correctly. The key to Rabin-Karp is to incrementally update the hash as the potential match moves along the string to be searched. As you've determined, if you recalculate the entire hash for each potential match position, things will be really slow.
For every case except for the first comparison, your hash function should take an existing hash, one new character, and one old character, and return an updated hash.