Deterministic automata to find number of subsequences in string ? How can I construct a DFA to find number of occurence string as a subsequence in another string?
eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?
also both string to be found and to be searched only contain characters from specific character Set . I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again . Please tell DFA solution ?
If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.
1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7
and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.
DP[a][b] = number of ways of being in state b after seeing the first a characters
= DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=1.
Then the total number of overlapping strings is DP[len(string)][7]
If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:
DP[a][b] = number of strings being in state b after seeing the first a characters
= DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and DP[a-1][b-1]>0
= DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.
Then the total number of non-overlapping strings is DP[len(string)][7]
This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').