I am working on a coding problem in which I have to delete all occurrences of a substring T in a string S (keeping in mind that removing one occurrence of T in S may generate a new occurrence of T), and then to return the resulting string S after all deletions. The size of both S and T can be up to 10^6.
For example, if I have S = "aabcbcd" and T = "abc", then removing all occurrences of abc in S results in S = "d".
The sample solution to this problem involves building a string R from S one character at a time, and whenever the end of R matches T, we delete it from R (the comparison between the end of R and T is determined by string hashing).
The solution says that
Since this deletion is at the end of R this is just a simple O(1) resize operation.
However, according to https://m.cplusplus.com/reference/string/string/resize/ the time complexity of string::resize is linear in the new string length. Ben Voigt confirms this in Why is string::resize linear in complexity?.
Also, in the solution the code involves using string::substr to double check if the end of R and T match (since hash(the end of R)==hash(T) does not guarantee the end of R equals to T):
/* If the end of R and T match truncate the end of R (and associated hash arrays). */
if (hsh == thsh && R.substr(R.size() - T.size()) == T) {
//...
}
Once again, https://m.cplusplus.com/reference/string/string/substr/ says that string::substr has linear time complexity.
Even if string::substr wasn't linear, then comparing the two strings directly would still cause the comparison to be linear in the size of T.
If this is true, wouldn't the time complexity of the solution be at least O(S.length()*T.length()), instead of O(S.length()) (according to the solution)? Any help is appreciated!
string::resize
isn't always linear. If you're expanding a string, it's linear on the number of characters copied, which is potentially the total number in the resulting string (but could be less, if the string already has enough space for the character(s) you add, so it only has to write the new characters).
Using resize
to reduce the size of a string will normally take constant time. In simplified form (and Leaving out a lot of other "stuff") string
can look something like this:
template <class T>
class string {
char *data;
size_t allocated_size;
size_t in_use_size;
public:
void resize(size_t new_size) {
if (new_size < in_use_size) {
in_use_size = new_size;
data[in_use_size] = '\0';
} else {
// code to expand string to desired size in O(n) time
}
}
// ...
};
So although it'll be linear when expanding the string, it'll typically have constant complexity when reducing the size.
As for using substr
, yes, in the case where the hashes match, substr
itself will be linear (it creates a new string
object) and you're going to do a linear-complexity comparison. I'd guess they're pretty much just presuming hash collisions are rare enough to ignore, so for most practical purposes, this only happens when you have an actual match.