Search code examples
computer-sciencefinite-automatasuffix-tree

finite state or suffix tree


I want to create an algorithm that finds substrings inside a string wich exist in an alphabet. For example in a string "abcdefhhhasddasdabbba" i want to find substrings that are spawned by alphabet {'a','b'}

So my output will be like: ab ,a ,abbba

If I use finite state I have to create exact finite-states that include my outputs so i will have to take all possible combinations in length of my sting input which I do not think it's efficient at all.

If I use suffix tree then how could I find substrings which may not be prefix or post fix inside the tree? Should I use for each node an array that keeps data for the subtree and then check if the array contains characters which are not included in alphabet ?

Edit:

Complexity is crusial.


Solution

  • This can be done by trivial loop, no data structures needed.

    for each letter in word
      if letter in alphabet, then add it to a current "X"
      otherwise emit current "X" and set "X" to empty string
    

    python example

    word = 'aababadadadab'
    alphabet = { 'a', 'b' }
    
    X = ''
    for letter in word + '$': 
      if letter in alphabet:
        X += letter
      else:
        print X
        X = ''
    

    output:

    aababa
    a
    a
    ab
    

    I use '$' as a special character from outside of alphabet, for simplier code