Search code examples
pythonregexpython-re

How to perform this conditional regex replacement task, using lookarounds for bracket & quotation characters?


Currently I have this for my regex replacement:

import re
line = re.sub(r"\bstr\b", ":class:`str`", line)

I want a result like below, where the <, >, and ` stop replacement from occurring, and no replacement occurs if inside square brackets. I tried implementing negative lookarounds for just one of the characters, but I couldn't make it work.

Example input:

line = r"""list of str and list of :class:`str` or :class:`string <str>` or Union[str, Tuple[int, str]]"""

Example output of what I am aiming for:

"list of :class:`str` and list of :class:`str` or :class:`string <str>` or Union[str, Tuple[int, str]]"

Solution

  • Here is a solution with negative lookbehind and negative lookahead.

    line = r"""list of str and list of :class:`str` or :class:`string <str>` or Union[str, Tuple[int, str]]"""
    pattern = r"(?<![\[`<])(str)(?![\]`>])"
    re.sub(pattern, r":class:`str`", line)
    

    Output:

    list of :class:`str` and list of :class:`str` or :class:`string <str>` or Union[str, Tuple[int, str]]
    

    Check the Regex on Regex101

    UPDATE on question in the comments.
    Here is my conditional sub approach, based on the idea of this approach by @Valdi_Bo

    line = r"""list of str and list of :class:`str` or :class:`string <str>` or Union[str, Tuple[int, str]]"""
    pattern = r"\bstr\b"
    def conditional_sub(match):
        if not line[match.start()-1] in ['[', '`','<'] and not line[match.end()] in [']', '`', '>']:
            return r":class:`str`"
        else:
            return r"~str"
    
    re.sub(pattern, conditional_sub, line)
    

    Output:

    list of :class:`str` and list of :class:`~str` or :class:`string <~str>` or Union[~str, Tuple[int, ~str]]
    

    match.start() and match.end() are just index numbers. With them we can check for the symbols before/after like in the pattern before and decide what to replace.