The problem:
Given a string s. For each i from 1 to |s|, find the number of occurrences of its prefix of length i in the string.
Input:
The first line of input contains an integer q (1≤q≤10^5) — the number of datasets in the test.Each dataset consists of a string s. The length of the string s is from 1 to 10^6 characters. The string consists exclusively of lowercase Latin alphabet letters.
The sum of the lengths of strings s across all q datasets in the test does not exceed 10^6.
Output:
For each dataset, output |s| integers c1, c2, ..., c|s|, where c[i] is the number of occurrences of the prefix of length i in the string s.Example
Input:
5 abacaba eeeee abcdef ababababa kekkekkek
Output:
4 2 2 1 1 1 1 5 4 3 2 1 1 1 1 1 1 1 5 4 4 3 3 2 2 1 1 6 3 3 2 2 2 1 1 1
The task must be solved exclusively using the Z-function, and the total time for a string of length 10^6 characters should not exceed 2 seconds.
My solution looks like this:
#include <iostream>
#include <string>
#include <vector>
std::vector<int> ZFunc(const std::string& s) {
const int sz = s.size();
std::vector<int> z(sz, 0);
for (int i = 1, l = 0, r = 0; i != sz; ++i) {
if (r >= i)
z[i] = std::min(z[i - l], r - i + 1);
while (z[i] + i < sz && s[i + z[i]] == s[z[i]])
z[i]++;
if (z[i] > r - i + 1) {
l = i;
r = i + z[i] - 1;
}
}
return z;
}
int main() {
int n;
std::cin >> n;
std::vector<std::vector<int>> res(n);
for (int k = 0; k != n; ++k) {
std::string s;
std::cin >> s;
res[k].resize(s.size(), 1);
std::vector<int> z = ZFunc(s);
for (int i = 1; i != z.size(); ++i) {
while (z[i]--)
res[k][z[i]]++;
}
}
for (const auto& ivec : res) {
for (int i : ivec)
std::cout << i << " ";
std::cout << std::endl;
}
return 0;
}
Its only issue is that I am exceeding the allowed time on tests, and I do not know how to optimize this algorithm. I suspect that the problem lies in strings of the form aaaaaa...aaabcd, where the number of identical characters is quite large, causing the algorithm to have a complexity of O(n^2).
Algorithm Description:
The essence of the proposed algorithm is as follows: we find the Z-function for each string. The i-th value of the Z-function represents the length of the matching prefix and substring at position i. More about the Z-function can be found here.
For each string, a prefix of length i occurs at least once, so initially, res[k].resize(s.size(), 1);
is set to 1 for all prefixes of the string.
Since the task is to specify, for all i from 1 to |s|, the number of occurrences of a prefix of length i in the string s, when we look at a non-zero value of the Z-function at position i, it means that the prefix of length z[i] matches the substring of length z[i], starting at position i. This means we have found at least one more occurrence for all substrings of the prefix of length i.
For example:
s = abacaba
z = 0010301
When we reach i = 2, z[i] = 1, we can confidently increment the result res[0]++, as s[2]=s[0]='a'. We have found the first occurrence of the prefix of length 1. Next, z[4]=3, which means we have found a match for the prefix 'aba', including 'a', 'ab', and 'aba'. Therefore, for prefixes of length 3, 2, and 1, we add +1.
This is done in the following lines:
while (z[i]--)
res[k][z[i]]++;
Here's right way to solve this problem. Instead of my while loop, I could use cumulative sums for frequencies of prefix length i: answer