Search code examples
pythonpython-3.xcombinationsproductpython-itertools

Combinations by changing 3 or more places in a string


The code below takes a string, then in p = there is a mapping for every index that can be changed and with what characters. For example d1 is atp[0], so the character a (at string[0]) can be replaced by d or 1. The number of characters that have to change at a time is limited to the number 3.

from itertools import combinations, product

string = "abc123" 

p = ["d1", "c3", "", "", "0", "56"]

d = {idx: (v if string[idx] in v else string[idx]+v) for idx, v in enumerate(p)} 

all_of_em = (''.join(whatever) for whatever in product(*d.values()))

fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, string)) == 3]

with open("list.txt","w") as f: 
    for w in fewer: 
        f.write(w+"\n")

As a result of the above code, we find all possible combinations if we change 3 places in a string with the specified alternative characters in p.

acc105
acc106
a3c105
a3c106
dbc105
dbc106
dcc125
dcc126
dcc103
d3c125
d3c126
d3c103
1bc105
1bc106
1cc125
1cc126
1cc103
13c125
13c126
13c103

The goal is to print the results faster, for example these lines should be changed I think:

with open("list.txt","w") as f: 
    for w in fewer: 
        f.write(w+"\n")

So the output will be saved as python3 py.py >> list.txt

Will enjoy to learn from your solution.


Solution

  • Your solution is based on a brute force approach. You are generating all possible alternative strings and then filtering out the ones that do not meet the criteria of only 3 changes. A better approach would be to look only at those combinations that will meet the criteria. I will ignore the part of saving to a file, since it will be the same for both solutions. A faster solution would be:

    def change_string(input_string, mapping, replace=3):
        input_string = list(input_string)
    
        to_replace = dict()
        for idx, replacement in enumerate(mapping):
            if not replacement: continue
    
            to_replace[idx] = replacement
            if input_string[idx] in replacement:
                to_replace[idx] = [char for char in replacement if char != mapping[idx]]
    
        for indices in combinations(to_replace, r=replace):
            for chars in product(*[to_replace[index] for index in indices]):
                temp = input_string[:]
                for index, char in zip(indices, chars):
                    temp[index] = char
                yield ''.join(temp)
    

    Explanation

    I change the input string to a list, so I can do the replacement faster, since lists are mutable and strings are not.

    Then I filter the mapping (p) to represent only indices that are going to be changed. This removes all empty strings and provides me with the indices that I have to look at.

    to_replace = dict()
        for idx, replacement in enumerate(mapping):
            if not replacement: continue
    
            to_replace[idx] = replacement
            if input_string[idx] in replacement:
                to_replace[idx] = [char for char in replacement if char != mapping[idx]]
    

    Note: I also make sure that the values in mapping are unequal to the original string values, which might not be what you want.

    Then I create all possible combinations of indices with the required length (replace=3).

    for indices in combinations(to_replace, r=replace):
    

    Using your example this will contain the following group of indices:

    (0, 1, 4)
    (0, 1, 5)
    (0, 4, 5)
    (1, 4, 5)
    

    Then I create all possible character combinations from those indices:

    for chars in product(*[to_replace[index] for index in indices]):
    

    For example with indices (0, 1, 4) or the values ('d1', 'c3', '0'):

    ('d', 'c', '0')
    ('d', '3', '0')
    ('1', 'c', '0')
    ('1', '3', '0')
    

    Are all the character combinations produced.

    Then I create a copy of the input string (note it is a list, so we can perform fast replacements) and replace the characters at the correct indices.

    Comparison

    • Your function
    def OP(input_string, replace=3):
        p = ["d1", "c3", "", "", "0", "56"]
        d = {idx: (v if input_string[idx] in v else input_string[idx] + v) for idx, v in enumerate(p)}
        all_of_em = (''.join(whatever) for whatever in product(*d.values()))
        fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, input_string)) == replace]
        return fewer
    

    Replace is 3

    print(timeit.timeit("OP('abc123')", setup="from __main__ import OP", number=100_000))
    # 5.6281933 seconds
    
    print(timeit.timeit("list(change_string('abc123', ['d1', 'c3', '', '', '0', '56']))",
                        setup="from __main__ import change_string", number=100_000))
    # 1.3682368 seconds
    

    Which is about 3 times as fast, now the interesting part is to see what happens if we increase the replace value to 4

    Replace is 4

    print(timeit.timeit("OP('abc123', replace=4)", setup="from __main__ import OP", number=100_000))
    # 5.5450302 seconds
    
    print(timeit.timeit("list(change_string('abc123', ['d1', 'c3', '', '', '0', '56'], replace=4))",
                        setup="from __main__ import change_string", number=100_000))
    # 0.6179974 seconds
    

    A whooping 9 times faster, since my solution only has to check a few combinations.

    Similar increase can be seen with using replace is 2 or 1.