For example, given the two letters A and B, I'd like to generate all strings of length n that have x A's and y B's.
I'd like this to be done efficiently. One way that I've considered is to build a length x list of A's, and then insert y B's into the list every possible way. But insertion into a python list is linear, so this method would suck as the list gets big.
PERFORMANCE GOAL (this may be unreasonable, but it is my hope): Generate all strings of length 20 with equal numbers of A and B in time less than a minute.
EDIT: Using permutations('A' * x, 'B' * y) has been suggested. While not a bad idea, it's wasting a lot. If x = y = 4, you'd generate the string 'AAAABBBB' many times. Is there a better way that might generate each string only once? I've tried code to the effect of set(permutations('A' * x, 'B' * y)) and it is too slow.
Regarding your concerns with the performance, here is an actual generator implementation of your idea (without insert
). It finds the positions for B
and fill the list accordingly.
import itertools
def make_sequences(num_a, num_b):
b_locations = range(num_a+1)
for b_comb in itertools.combinations_with_replacement(b_locations, num_b):
result = []
result_a = 0
for b_position in b_comb:
while b_position > result_a:
result.append('A')
result_a += 1
result.append('B')
while result_a < num_a:
result.append('A')
result_a += 1
yield ''.join(result)
It does perform better. Comparing with the Greg Hewgill's solution (naming it make_sequences2
):
In : %timeit list(make_sequences(4,4))
10000 loops, best of 3: 145 us per loop
In : %timeit make_sequences2(4,4)
100 loops, best of 3: 6.08 ms per loop
Edit
A generalized version:
import itertools
def insert_letters(sequence, rest):
if not rest:
yield sequence
else:
letter, number = rest[0]
rest = rest[1:]
possible_locations = range(len(sequence)+1)
for locations in itertools.combinations_with_replacement(possible_locations, number):
result = []
count = 0
temp_sequence = sequence
for location in locations:
while location > count:
result.append(temp_sequence[0])
temp_sequence = temp_sequence[1:]
count += 1
result.append(letter)
if temp_sequence:
result.append(temp_sequence)
for item in insert_letters(''.join(result), rest):
yield item
def generate_sequences(*args):
'''
arguments : squence of (letter, number) tuples
'''
(letter, number), rest = args[0], args[1:]
for sequence in insert_letters(letter*number, rest):
yield sequence
Usage:
for seq in generate_sequences(('A', 2), ('B', 1), ('C', 1)):
print seq
# Outputs
#
# CBAA
# BCAA
# BACA
# BAAC
# CABA
# ACBA
# ABCA
# ABAC
# CAAB
# ACAB
# AACB
# AABC