Search code examples
stringalgorithmnumbersrabin-karp

string Rabin-Karp elementary number notations


I am reading about String algorithms in Introduction to Algorithms by Cormen etc

Following is text about some elementary number theoretic notations.

Note: In below text refere == as modulo equivalence.

Given a well-defined notion of the remainder of one integer when divided by another, it is convenient to provide special notation to indicate equality of remainders. If (a mod n) = (b mod n), we write a == b (mod n) and say that a is equivalent to b, modulo n. In other words, a == b (mod n) if a and b have the same remainder when divided by n. Equivalently, a == b (mod n) if and only if n | (b - a). For example, 61 == 6 (mod 11). Also, -13 == 22 == 2 == (mod 5).

The integers can be divided into n equivalence classes according to their remainders modulo n. The equivalence class modulo n containing an integer a is

[a]n = {a + kn : k Z} .

For example, [3]7 = {. . . , -11, -4, 3, 10, 17, . . .}; other denotations for this set are [-4]7 and [10]7.

Writing a belongs to [b]n is the same as writing a == b (mod n). The set of all such equivalence classes is

Zn = {[a]n : 0 <= a <= n - 1}.----------> Eq 1

My question in above text is in equation 1 it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?

In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?


Solution

  • This is not a programming question, but never mind...

    it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?

    Because [-4]n is the same equivalence class as [x]n for some x such that 0 <= x < n. So equation 1 takes advantage of the fact to "neaten up" the definition and make all the possibilities distinct.

    In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?

    The Rabin-Karp algorithm requires you to calculate a hash value for the substring you are searching for. When hashing, it is important to use a hash function that uses the whole of the available domain even for quite small strings. If your hash is a 32 bit integer and you just add the successive unicode values together, your hash will usually be quite small resulting in lots of collisions.

    So you need a function that can give you large answers. Unfortunately, this also exposes you to the possibility of integer overflow. Hence you use modulo arithmetic to keep the comparisons from being messed up by overflow.