Search code examples
c++stringhashpolynomials

Modify hash value on replacing a single character in string (c++)


I am using a polynomial hash function to calculate the hash value of a string (consisting of only lowercase english letters) as follows:

int SZ = 105, P = 31;
long long M = 1e12 + 9;
vector <long long> pw;

pw.resize(SZ, 1);
for(int i = 1; i < SZ; i++) {
   pw[i] = (pw[i - 1] * P) % M;
}

long long calculateHash(string &s) {
    long long h = 0;
    
    for(int i = 0; i < s.length(); i++) {
        h = (h + (s[i] - 'a' + 1) * pw[i]) % M;
    }
    
    return h;
}

I don't want to re-calculate the hash of the entire string in O(N) time when I have to replace just one character at any given position. So inorder to do this in O(1) time, I do the following operation:

long long h1 = calculateHash(s1);
long long h2 = calculateHash(s2);

// Only one character differs in `s1` and `s2` at index `idx`

// Modifying hash for h1 to incorporate s2[idx] and removing s1[idx]
h1 = (h1 + ((s2[idx] - s1[idx]) * pw[idx])) % M;

Now when I check h1 == h2, it should be equal ideally, right? It does work for smaller strings but fails at times, I get negative values for h1, not sure if this is an overflow issue or ((s2[idx] - s1[idx]) * pw[idx]) is more negative causing h1 to fall below zero.

Could anyone, suggest a way to re-calculate the hash in O(1) time when only one character is changed? Thank you in advance!


Solution

  • In principle your idea of changing the resulting value ist correct, but what you need is a modulo operator, which result is always positiv, also for negativ input numbers.

    To emulate this behaviour with C++ modulo you could do the following:

    long long tmp=(h1 + ((s2[idx] - s1[idx]) * pw[idx])) % M;
    h1=(tmp+M)%M;
    

    The first line is the same operation you have done, an the second line make the result positiv, because tmp could not be less than -M after the C++ modulo operation. The additional modulo is needed to assure that the number keeps smaller that M, even if tmp was already positiv.