This is the problem:
Suppose S is a set of strings and we know the total length of all strings in S in n. We must find a data structure with space O(n) that finds LCP(s,t) in O(1) which LCP is the longest common prefix between strings s,t.
At first I thought I could use hashing since we can check numbers in constant time and find sub-strings in constant time if we prehashed strings.But I don't think this would work since it needs more space and after a little searching I found out that the solution probably lies in using Trie's,Suffix arrays and maybe LCA and RMQ. I think I'm close to the answer but don't know how these concepts can work together to make a data structure that give LCP fast!
Thanks for reading
I think that I know the answer they are looking for.
First, construct a trie for all of the strings. Each node in the trie can include a pointer to a string starting with that prefix, and a length. Map each string to the final node in the trie that that string winds up on.
Now when given a pair of strings (which presumably you are told as string i
and string j
) the problem of returning a string is a question of finding the least common ancestor, then returning the pair (pointer_to_start_of_string, length)
.
But a trie can be written as a tree, and then Tarjan's off line lowest Common Ancestors Algorithm (see https://www.geeksforgeeks.org/tarjans-off-line-lowest-common-ancestors-algorithm/) can be used to preprocess that tree to answer LCA questions very quickly.
It is technically not O(1)
. However it is O(inverse_ackermann(n))
which can be treated as a fairly small constant for any computer that fits in the observable universe.