Search code examples
pythonstringreplacesubstring

What is the difference between re.sub and String replace in Python 3?


I'm trying to solve this problem on Kattis. I have come up with my first solution:

str_to_replace = {"RBL": "C", "RLB": "C", "LBR": "C", "LRB": "C", "BRL": "C", "BLR": "C"}
uin = input().upper()
for key, value in str_to_replace.items():
    uin = uin.replace(key, value)
uin = uin.replace("R", "S")
uin = uin.replace("B", "K")
uin = uin.replace("L", "H")
print(uin)

Submitted but the judge said wrong answer in a hidden test case.

After googling for a while, I tried re.sub and it works.

uin = input().upper()
uin = re.sub('RBL|RLB|BRL|BLR|LRB|LBR', 'C', uin)
uin = uin.replace("R", "S")
uin = uin.replace("B", "K")
uin = uin.replace("L", "H")
print(uin)

I'm trying to figure out which test case I have missed and what are the differences between them. Thanks for your help!


Solution

  • Johnny's comment nailed it.

    Basically, the re.sub version makes one pass, processing all combos from front to back. So a substring like "LRBL" will match "LRB" and replace it with "C", then move on to "L" with a result of "CL".

    On the other hand, the .replace() version uses a loop, so it processes the string in multiple passes. For the "LRBL" example, if "RBL" -> "C" happened to be processed before "LRB" -> "C", then the output would be "LC" rather than "CL".

    The other .replace() calls look suspicious, because they would do transitive replacements if there was any overlap. Luckily there isn't such a case here, so they're just a bit inefficient.

    If you use a function for your replacer and add . to the regex to handle any single-letter remnants after trying to locate the chunks of 3, you can do it in a single pass:

    import re
    
    d = dict(B="K", L="H", R="S")
    print(re.sub("RBL|RLB|LBR|LRB|BLR|BRL|.", lambda m: d.get(m[0], "C"), input()))