Search code examples
pythonpython-itertools

Finding all sequences of A, B such that have a specified number of each element


For example, given the two letters A and B, I'd like to generate all strings of length n that have x A's and y B's.

I'd like this to be done efficiently. One way that I've considered is to build a length x list of A's, and then insert y B's into the list every possible way. But insertion into a python list is linear, so this method would suck as the list gets big.

PERFORMANCE GOAL (this may be unreasonable, but it is my hope): Generate all strings of length 20 with equal numbers of A and B in time less than a minute.

EDIT: Using permutations('A' * x, 'B' * y) has been suggested. While not a bad idea, it's wasting a lot. If x = y = 4, you'd generate the string 'AAAABBBB' many times. Is there a better way that might generate each string only once? I've tried code to the effect of set(permutations('A' * x, 'B' * y)) and it is too slow.


Solution

  • Regarding your concerns with the performance, here is an actual generator implementation of your idea (without insert). It finds the positions for B and fill the list accordingly.

    import itertools
    
    def make_sequences(num_a, num_b):
        b_locations = range(num_a+1)
        for b_comb in itertools.combinations_with_replacement(b_locations, num_b):
            result = []
            result_a = 0
            for b_position in b_comb:
                while b_position > result_a:
                    result.append('A')
                    result_a += 1
                result.append('B')
            while result_a < num_a:
                result.append('A')
                result_a += 1
            yield ''.join(result)
    

    It does perform better. Comparing with the Greg Hewgill's solution (naming it make_sequences2):

    In : %timeit list(make_sequences(4,4))
    10000 loops, best of 3: 145 us per loop
    
    In : %timeit make_sequences2(4,4)
    100 loops, best of 3: 6.08 ms per loop
    

    Edit

    A generalized version:

    import itertools
    
    def insert_letters(sequence, rest):
        if not rest:
            yield sequence
        else:
            letter, number = rest[0]
            rest = rest[1:]
            possible_locations = range(len(sequence)+1)
            for locations in itertools.combinations_with_replacement(possible_locations, number):
                result = []
                count = 0
                temp_sequence = sequence
                for location in locations:
                    while location > count:
                        result.append(temp_sequence[0])
                        temp_sequence = temp_sequence[1:]
                        count += 1
                    result.append(letter)
                if temp_sequence:
                    result.append(temp_sequence)
                for item in insert_letters(''.join(result), rest):
                    yield item
    
    def generate_sequences(*args):
        '''
        arguments : squence of (letter, number) tuples
        '''
        (letter, number), rest = args[0], args[1:]
        for sequence in insert_letters(letter*number, rest):
            yield sequence
    

    Usage:

    for seq in generate_sequences(('A', 2), ('B', 1), ('C', 1)):
        print seq
    
    # Outputs
    # 
    # CBAA
    # BCAA
    # BACA
    # BAAC
    # CABA
    # ACBA
    # ABCA
    # ABAC
    # CAAB
    # ACAB
    # AACB
    # AABC