Search code examples
algorithmdata-structuresprefix

finding LCP of two strings in constant time and linear space


This is the problem:
Suppose S is a set of strings and we know the total length of all strings in S in n. We must find a data structure with space O(n) that finds LCP(s,t) in O(1) which LCP is the longest common prefix between strings s,t.


At first I thought I could use hashing since we can check numbers in constant time and find sub-strings in constant time if we prehashed strings.But I don't think this would work since it needs more space and after a little searching I found out that the solution probably lies in using Trie's,Suffix arrays and maybe LCA and RMQ. I think I'm close to the answer but don't know how these concepts can work together to make a data structure that give LCP fast!


Thanks for reading


Solution

  • I think that I know the answer they are looking for.

    First, construct a trie for all of the strings. Each node in the trie can include a pointer to a string starting with that prefix, and a length. Map each string to the final node in the trie that that string winds up on.

    Now when given a pair of strings (which presumably you are told as string i and string j) the problem of returning a string is a question of finding the least common ancestor, then returning the pair (pointer_to_start_of_string, length).

    But a trie can be written as a tree, and then Tarjan's off line lowest Common Ancestors Algorithm (see https://www.geeksforgeeks.org/tarjans-off-line-lowest-common-ancestors-algorithm/) can be used to preprocess that tree to answer LCA questions very quickly.

    It is technically not O(1). However it is O(inverse_ackermann(n)) which can be treated as a fairly small constant for any computer that fits in the observable universe.