Search code examples
pythoncombinationspython-itertools

Find all possible combinations with replacement in a string


Here's the code I have so far:

from itertools import combinations, product

string = "abcd012345"

char = "01268abc"

for i, j in combinations(tuple(range(len(string))), 2):
    for char1, char2 in product(char, char):
        print(string[:i] + char1 + string[i+1:j] + char2 + string[j+1:])

So, the string is abcd012345, and I change two characters in order to find all possible combinations. The characters are 01268abc. In this example, we got 2880 combinations.

The goal is to set what character(s) will be in a specified place of the string. Example below:

from itertools import combinations, product

string = "abcd012345"
# place   0123456789      

char_to change_for_place0 = "ab02"
char_to change_for_place1 = "14ah"
char_to change_for_place2 = "94nf"
char_to change_for_place3 = "a"
char_to change_for_place4 = "9347592"
char_to change_for_place5 = "93478nvg"
char_to change_for_place6 = "b"
char_to change_for_place7 = ""
char_to change_for_place8 = ""
char_to change_for_place9 = "84n"

for i, j in combinations(tuple(range(len(string))), 2):
    for char1, char2 in product(char, char):
        print(string[:i] + char1 + string[i+1:j] + char2 + string[j+1:])

Note:

  1. some places can be empty and stays the same as in the place 7 and 8.
  2. the number of places will be 64.
  3. the number of characters to change will be 4, not 2 as in the example.

I will enjoy to learn from your solutions and ideas, thank you.


Solution

  • This boils down to adding the current letter of each position inside string to your current replacements for that position and then creating all possible combinations of those options:

    from itertools import combinations, product
    
    string = "abcd012345"
    
    # must be of same lenght as (string), each entry correspond to the same index in string
    p = ["ab02", "14ah", "94nf", "a", "9347592", "93478nvg", "b", "", "", "84n"]  
    
    errmsg = f"Keep them equal lenghts: '{string}' ({len(string)}) vs {p} ({len(p)})"
    assert len(p)==len(string), errmsg
    
    # eliminates duplicates from letter in string + replacments due to frozenset()
    d = {idx: frozenset(v + string[idx]) for idx, v in enumerate(p)} 
    
    # creating this list take memory
    all_of_em = [''.join(whatever) for whatever in product(*d.values())]
    
    # if you hit a MemoryError creating the list, write to a file instead
    # this uses a generator with limits memory usage but the file is going
    # to get BIG 
    # with open("words.txt","w") as f:
    #    for w in (''.join(whatever) for whatever in product(*d.values())):
    #        f.write(w+"\n")
    
    print(*all_of_em, f"\n{len(all_of_em)}", sep="\t")
      
    

    Output:

    2and2g234n      2and2g2348      2and2g2344      2and2g2345      2and27b34n      
    [...snipp...]
    249d99234n      249d992348      249d992344      249d992345      
    100800
    

    If you value the order of letters in your replacements, use

    d = {idx: (v if string[idx] in v else string[idx]+v) for idx, v in enumerate(p)} 
    

    instead:

    abcd012345      abcd012348      [...]     2hfa2gb344      2hfa2gb34n      115200
    

    The difference in amounts is due to duplicate 9 in "9347592" wich is removed using frozensets.


    To get only those that changed fewer then 5 things:

    # use a generator comprehension to reduce memory usage
    all_of_em = (''.join(whatever) for whatever in product(*d.values()))
    
    # create the list with less then 5 changes from the generator above
    fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, string)) < 5]