Search code examples
sequencesequences

Number of subsequences of a string with K distinct characters


I want to solve a problem and I need some help because my code doesn't work.

Ok, so I have a sequence S(input data) and I need to find the number of subsequences such that a subsequence number of distinct characters must be equal with K (input data)

Example:

For S = abcaa and K = 3, the answer is 5.
s1 = abc
s2 = abca
s3 = abcaa
s4 = bca
s5 = bcaa

I was thinking a little and I look on internet for some answers but I don't find what I really want. So, I think that i must find frequency of every character in sequence, but I don't know what to do after this...


Solution

  • Not the most efficient solution, but here you go : Start by iterating through your string and, for every position , you need to do 2 things. First of all, iterate from that position until you found k different characters ( use a frequency array for that) or until you reach the end of the string. In case you found a subsequence , start iterating again from the position where you stopped + 1 and , while the characters you find are already in your frequency vector and you haven't reached the end of the string , count the number of letters you find . You add 1 to that number(because of the first subsequence) and there you go, found all subsequences from that position. Then you increment your first index and continue.