What is the best way to unpack SequenceMatcher
loop results in Python so that values can be easily accessed and processed?
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
match_id = []
for block in diff.get_matching_blocks():
match_id.append(block)
print(match_id)
String integers represent Chinese Characters.
The current iteration code stores match results in a list like this:
match_id
[Match(a=0, b=0, size=4), Match(a=4, b=7, size=2), Match(a=6, b=16, size=4), Match(a=10, b=27, size=0)]
I'd eventually like to mark out the comments with "{{"
and "}}"
like so:
"1234{{354}}56{{3534535}}7890{{1343154}}"
Which means, I am interested in unpacking the above SequenceMatcher
results and do some calculations on specific b
and size
values to yield this sequence:
rslt = [[0+4,7],[7+2,16],[16+4,27]]
which is a repetition of [b[i]+size[i],b[i+1]]
.
SequenceMatcher
results to yield a sequenceYou can unzip match_id
and then use a list comprehension with your expression.
a, b, size = zip(*match_id)
# a = (0, 4, 6, 10)
# b = (0, 7, 16, 27)
# size = (4, 2, 4, 0)
rslt = [[b[i] + size[i], b[i+1]] for i in range(len(match_id)-1)]
# rslt = [[4, 7], [9, 16], [20, 27]]
Reference for zip
, a Python built-in function: https://docs.python.org/3/library/functions.html#zip
"{{"
and "}}"
You can loop through rslt
and then nicely append the match-so-far and mark out the comments.
rslt_str = ""
prev_end = 0
for start, end in rslt:
rslt_str += commented[prev_end:start]
if start != end:
rslt_str += "{{%s}}" % commented[start:end]
prev_end = end
# rslt_str = "1234{{354}}56{{3534535}}7890{{1343154}}"