I am trying to remove comments from my TeX code. I want to trim text after %
, but want to avoid the escaped \%
.
I thought that this would do it
re.sub(r"([^%]*)([^\\][%])(.*)$", r"\1", "10 \% foo.% bar")
which outputs the almost right
'10 \\% foo'
expected output:
'10 \\% foo.'
Why does it trim away the last character before %
? And, how can I avoid it?
Your problem is your regex matches [zero or more non-percent characters (group 1)], then it matches [a non-backslash character and a percent character (group 2)].
You replace this entire match with the first group, so you miss out the non-backslash character in group 2
Instead, use a negative lookbehind, which only matches percent characters without a backslash before them, and then everything until the rest of the line Try it:
(?<!\\)%.*$
In python:
>>> re.sub(r"(?<!\\)%.*$", "", "10 \% foo.% bar")
'10 \\% foo.'
With a multi-line string, use the re.M
flag:
>>> ss = """10 \% foo.% bar"
Hello world
Hello world % this is a comment
% This is also a comment
"""
>>> print(re.sub(r"(?<!\\)%.*$", "", ss, flags=re.M))
10 \% foo.
Hello world
Hello world