Search code examples
algorithmsuffix-tree

Search in implicit suffix tree constructed by Ukkonen algorithm


I encountered a problem which requires a data structure that will hold a string S and allow me to:

  1. Check if word W is a subword of S in O(|W|) time
  2. Find longest suffix of S that is also a prefix of given word U in O(|U|) time
  3. Append string K at the end of S in O(|K|) time

I figured out that suffix trees constructed by Ukkonen algorithm are what I'm searching for. Algorithm is described as "On-line construction of suffix trees", and I have a problem with "online" part: after insertion of each character algorithm constructs an implicit suffix tree, which can be converted to explicit in final step. But what if I want to use implicit tree for searching before that step? "online" suggests that it's possible after inserting any prefix of analyzed string, but I can't find any example of even simpliest algorithm that operates on implicit tree.

My question is: How do I search for a string in implicit suffix tree?

EDIT: I accepted a very good answer that solves my problem, but in the meantime I managed to figure out a simplier solution to 2: It suffices to search for U in suffix of S of length |U| using KMP algorithm, and last number of matched characters will be string overlap.


Solution

  • There is only one difference between implicit suffix tree and explicit suffix tree: it does not contain end-of-string markers (and does not contain any branches corresponding to these end-of-string markers).

    This means there is no difference where to search for a substring - in implicit suffix tree or in explicit suffix tree. Since implicit suffix tree contains less unnecessary branches, this guarantees even more efficient (but still linear-time) algorithm for substring search.

    So requirement #1 is satisfied automatically: just search suffix tree from the root and choose branches that match given word.

    As for requirement #2, I think, you cannot satisfy it with the same implicit suffix tree. Because you need end-of-string markers to work with suffixes.

    But you could do it in O(|U|) time with separate (explicit) suffix tree for given word U. The trick is to reverse this word prior to constructing its suffix tree. To find the longest suffix of S that is also a prefix of U, use this separate suffix tree to find the longest prefix of the reversed string S that is also a suffix of the reversed string U. Just search this suffix tree from the root, choose branches that match the reversed string S, and remember the latest node with end-of-string marker. Then reverse the string on the path from root to this node (or determine length of this path and copy substring of the same length from the tail of S).