I want to create an algorithm that finds substrings inside a string wich exist in an alphabet. For example in a string "abcdefhhhasddasdabbba" i want to find substrings that are spawned by alphabet {'a','b'}
So my output will be like: ab ,a ,abbba
If I use finite state I have to create exact finite-states that include my outputs so i will have to take all possible combinations in length of my sting input which I do not think it's efficient at all.
If I use suffix tree then how could I find substrings which may not be prefix or post fix inside the tree? Should I use for each node an array that keeps data for the subtree and then check if the array contains characters which are not included in alphabet ?
Edit:
Complexity is crusial.
This can be done by trivial loop, no data structures needed.
for each letter in word
if letter in alphabet, then add it to a current "X"
otherwise emit current "X" and set "X" to empty string
python example
word = 'aababadadadab'
alphabet = { 'a', 'b' }
X = ''
for letter in word + '$':
if letter in alphabet:
X += letter
else:
print X
X = ''
output:
aababa
a
a
ab
I use '$' as a special character from outside of alphabet, for simplier code