Search code examples
pythoniterable-unpacking

Unpacking SequenceMatcher loop results


What is the best way to unpack SequenceMatcher loop results in Python so that values can be easily accessed and processed?

from difflib import *

orig = "1234567890"

commented = "123435456353453578901343154"

diff = SequenceMatcher(None, orig, commented)

match_id = []
for block in diff.get_matching_blocks():
    match_id.append(block)

print(match_id)

String integers represent Chinese Characters.

The current iteration code stores match results in a list like this:

match_id
[Match(a=0, b=0, size=4), Match(a=4, b=7, size=2), Match(a=6, b=16, size=4), Match(a=10, b=27, size=0)]

I'd eventually like to mark out the comments with "{{" and "}}" like so:

"1234{{354}}56{{3534535}}7890{{1343154}}"

Which means, I am interested in unpacking the above SequenceMatcher results and do some calculations on specific b and size values to yield this sequence:

rslt = [[0+4,7],[7+2,16],[16+4,27]]

which is a repetition of [b[i]+size[i],b[i+1]].


Solution

  • 1. Unpacking SequenceMatcher results to yield a sequence

    You can unzip match_id and then use a list comprehension with your expression.

    a, b, size = zip(*match_id)
    # a    = (0, 4,  6, 10)
    # b    = (0, 7, 16, 27)
    # size = (4, 2,  4,  0)
    
    rslt = [[b[i] + size[i], b[i+1]] for i in range(len(match_id)-1)]
    # rslt = [[4, 7], [9, 16], [20, 27]]
    

    Reference for zip, a Python built-in function: https://docs.python.org/3/library/functions.html#zip

    2. Marking out the comments with "{{" and "}}"

    You can loop through rslt and then nicely append the match-so-far and mark out the comments.

    rslt_str = ""
    prev_end = 0
    
    for start, end in rslt:
        rslt_str += commented[prev_end:start]
        if start != end:
            rslt_str += "{{%s}}" % commented[start:end]
        prev_end = end
    # rslt_str = "1234{{354}}56{{3534535}}7890{{1343154}}"