Search code examples
pythonpython-itertools

Python - build new string of specific length with n replacements from specific alphabet


I have been working on a fast, efficient way to solve the following problem, but as of yet, I have only been able to solve it using a rather slow, nest-loop solution. Anyways, here is the description:

So I have a string of length L, lets say 'BBBX'. I want to find all possible strings of length L, starting from 'BBBX', that differ at, at most, 2 positions and, at least, 0 positions. On top of that, when building the new strings, new characters must be selected from a specific alphabet.

I guess the size of the alphabet doesn't matter, so lets say in this case the alphabet is ['B', 'G', 'C', 'X'].

So, some sample output would be, 'BGBG', 'BGBC', 'BBGX', etc. For this example with a string of length 4 with up to 2 substitutions, my algorithm finds 67 possible new strings.

I have been trying to use itertools to solve this problem, but I am having a bit of difficulty finding a solution. I try to use itertools.combinations(range(4), 2) to find all the possible positions. I am then thinking of using product() from itertools to build all of the possibilities, but I am not sure if there is a way I could connect it somehow to the indices from the output of combinations().


Solution

  • Here's my solution.

    The first for loop tells us how many replacements we will perform. (0, 1 or 2 - we go through each)

    The second loop tells us which letters we will change (by their indexes).

    The third loop goes through all of the possible letter changes for those indexes. There's some logic to make sure we actually change the letter (changing "C" to "C" doesn't count).

    import itertools
    
    def generate_replacements(lo, hi, alphabet, text):
        for count in range(lo, hi + 1):
            for indexes in itertools.combinations(range(len(text)), count):
                for letters in itertools.product(alphabet, repeat=count):
                    new_text = list(text)
                    actual_count = 0
                    for index, letter in zip(indexes, letters):
                        if new_text[index] == letter:
                            continue
                        new_text[index] = letter
                        actual_count += 1
                    if actual_count == count:
                        yield ''.join(new_text)
    
    for text in generate_replacements(0, 2, 'BGCX', 'BBBX'):
        print text
    

    Here's its output:

    BBBX GBBX CBBX XBBX BGBX BCBX BXBX BBGX BBCX BBXX BBBB BBBG BBBC GGBX
    GCBX GXBX CGBX CCBX CXBX XGBX XCBX XXBX GBGX GBCX GBXX CBGX CBCX CBXX
    XBGX XBCX XBXX GBBB GBBG GBBC CBBB CBBG CBBC XBBB XBBG XBBC BGGX BGCX
    BGXX BCGX BCCX BCXX BXGX BXCX BXXX BGBB BGBG BGBC BCBB BCBG BCBC BXBB
    BXBG BXBC BBGB BBGG BBGC BBCB BBCG BBCC BBXB BBXG BBXC