I am using a polynomial hash function to calculate the hash value of a string (consisting of only lowercase english letters) as follows:
int SZ = 105, P = 31;
long long M = 1e12 + 9;
vector <long long> pw;
pw.resize(SZ, 1);
for(int i = 1; i < SZ; i++) {
pw[i] = (pw[i - 1] * P) % M;
}
long long calculateHash(string &s) {
long long h = 0;
for(int i = 0; i < s.length(); i++) {
h = (h + (s[i] - 'a' + 1) * pw[i]) % M;
}
return h;
}
I don't want to re-calculate the hash of the entire string in O(N) time when I have to replace just one character at any given position. So inorder to do this in O(1) time, I do the following operation:
long long h1 = calculateHash(s1);
long long h2 = calculateHash(s2);
// Only one character differs in `s1` and `s2` at index `idx`
// Modifying hash for h1 to incorporate s2[idx] and removing s1[idx]
h1 = (h1 + ((s2[idx] - s1[idx]) * pw[idx])) % M;
Now when I check h1 == h2
, it should be equal ideally, right? It does work for smaller strings but fails at times, I get negative values for h1
, not sure if this is an overflow issue or ((s2[idx] - s1[idx]) * pw[idx])
is more negative causing h1 to fall below zero.
Could anyone, suggest a way to re-calculate the hash in O(1) time when only one character is changed? Thank you in advance!
In principle your idea of changing the resulting value ist correct, but what you need is a modulo operator, which result is always positiv, also for negativ input numbers.
To emulate this behaviour with C++ modulo you could do the following:
long long tmp=(h1 + ((s2[idx] - s1[idx]) * pw[idx])) % M;
h1=(tmp+M)%M;
The first line is the same operation you have done, an the second line make the result positiv, because tmp
could not be less than -M
after the C++ modulo operation. The additional modulo is needed to assure that the number keeps smaller that M
, even if tmp
was already positiv.