Search code examples
c++algorithmrecursionsubstringsequences

Recursively generate ordered substrings from an ordered sequence of chars?


Edited after getting answers

Some excellent answers here. I like Josh's because it is so clever and uses C++. However I decided to accept Dave's answer because of it's simplicity and recursion. I tested them both and they both produced identical correct results (although in a different order). So thanks again everyone.


Say I have a string s of chars s[0]:s[N] and where each char s[i] <= s[i+1] For example the string

aaacdddghzz

I want to generate all combinations of substrings while keeping the same relationship between chars.

So for example I would get

a
aa
aaa
ad
aad
aaad
add
aadd
aaadd
addd
aaddd
aaaddd
d
dd
ddd
.
.
.
ac
aac
.
.
.
acdddghzz
aacdddghzz
aaacdddghzz

But not

ca
hdz
...etc

Now I know how to work out how many combinations there are. You create a histogram of the frequency of letters in the string. So in the above example the that would be

For string aaacdddghzz

a=3
d=3
c=1
g=1
h=1
z=2

and the formula is (a+1)(c+1)(d+1)(g+1)(h+1)(z+1) = 4*4*2*2*2*3 = 384. There are 384 substrings that keep the s[i] <=s [i+1] relationship.

So the question is how do I generate those 384 substrings recursively? Actually an iterative method would be just as good, maybe better as large strings with many unique chars might cause the stack to overflow. This sounds like homework but it isn't. I'm just useless at coming up with algorithms like this. I use C++ but pseudocode would be fine.


Solution

  • Following is a recursive algorithm to generate all subsequences.

    /* in C -- I hope it will be intelligible */
    
    #include <stdio.h>
    
    static char input[] = "aaabbbccc";
    static char output[sizeof input];
    
    /* i is the current index in the input string
     * j is the current index in the output string
     */
    static void printsubs(int i, int j) {
        /* print the current output string */
        output[j] = '\0';
        printf("%s\n", output);
        /* extend the output by each character from each remaining group and call ourselves recursively */
        while(input[i] != '\0') {
            output[j] = input[i];
            printsubs(i + 1, j + 1);
            /* find the next group of characters */
            do ++i;
            while(input[i] == input[i - 1]);
        }
    }
    
    int main(void) {
        printsubs(0, 0);
        return 0;
    }
    

    If your interest is merely in counting how many subsequences there are, you can do it much more efficiently. Simply count up how many of each letter there are, add 1 to each value, and multiply them together. In the above example, there are 3 a's, 3 b's, 3 c's, and 2 d's, for (3 + 1) * (3 + 1) * (3 + 1) * (2 + 1) = 192 subsequences. The reason this works is that you can choose between 0 and 3 a's, 0 and 3 b's, 0 and 3 c's, and 0 and 2 d's, and all of these choices are independent.