Search code examples
pythonpython-3.xregexnegative-lookbehind

Python regex negative lookbehind including start of line


consider the following input:

"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"

I am trying to write a regex that will capture and replace the double quote character with tilde if...

  • it is not preceded or followed by the delimiter | AND
  • it is not at the start of the line AND
  • it is not at the end of the line

In PHP I am able to get the regex pictured below working... php_regex

Due to constraints on the python regex, the same regex fails with the following error:

re.error: look-behind requires fixed-width pattern

my python code is as follows:

import re
orig_line = r'"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"'
new_line = re.sub(pattern='(?<!\||^)\"(?!\||$)',repl='~',string=orig_line)

How can I adjust this regex so it works in python?

Similar questions exist on SO, but I couldn't find any that address the start/end of line requirement.


Solution

  • You can use

    (?<=[^|])
    

    The (?<=[^|]) matches a location that is immediately preceded with any char but | and thus it cannot match at the start of the string.

    See the Python demo:

    import re
    orig_line = '"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"'
    new_line = re.sub(r'(?<=[^|])"(?=[^|])', '~', orig_line)
    print(new_line) # => "aaa"|"bbb"|"123"|"!~\"|"2010-01-04T00:00:01"