Search code examples
pythonpython-3.xpython-re

Why does Python 3's regular string substitution swallow characters?


import re

base_path = "c:\\five"
print(base_path)
filename = "<data>\\a.txt"
filename = re.sub(r'(?i)<data>', base_path, filename) 
print(filename)

Output:

c:\five
c:
ive\a.txt

Normally it should be: c:\five\a.txt.

The same code doesn't do this in Python 2.

Changing it to something like the following results in the same thing.

reg = re.compile(re.escape('<data_path>'), re.IGNORECASE)
filename = reg.sub(base_path, filename)

Solution

  • When c:\\five passes through re.sub it becomes c:\five (containing a \f form-feed character). It's a bit weird that it does this in the replacement string, but you can double-escape the backslashes as c:\\\\five to work around it. Or you can pass the replacement as a function, which will avoid this bit of regex-processing:

    base_path = "c:\\five"
    filename = "<data>\\a.txt"
    filename = re.sub(r'(?i)<data>', lambda _: base_path, filename) 
    print(filename)
    

    Output: c:\five\a.txt

    See the docs for details:

    repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth.