Search code examples
pythonregexpython-re

Using re.sub and replace with overall match


I was just writing a program where I wanted to insert a newline after a specific pattern. The idea was to match the pattern and replace with the overall match (i.e. capture group \0) and \n.

s = "abc"
insert_newline_pattern = re.compile(r"b")
re.sub(insert_newline_pattern, r"\0\n", s)

However the output is a\x00\nc, reading \0 as a null character.

I know that I can "simply" rewrite this as:

s = "abc"
insert_newline_pattern = re.compile(r"(b)")
re.sub(insert_newline_pattern, r"\1\n", s)

which outputs the desired ab\nc with the idea of wrapping the overall match into group \1 and substituting this. See also a Python regex101 demo.

Is there a way to access the overall match in any way, similar to this PCRE regex101 demo in Python?


Solution

  • You can use the form \g<0> in Python for the zeroeth group (or overall match from the pattern) which would be the same as $0 in PCRE (alternatively, in PCRE, you can use $& or \0 in replacement strings).

    s="abc"
    insert_newline_pattern=re.compile(r"b")
    re.sub(insert_newline_pattern,r"\g<0>\n",s)
    

    Result:

    'ab\nc'
    

    This form is to avoid the potential ambiguity of \10 used in PCRE. Is that the tenth backreference or the first followed by a literal '0'?

    It is documented under the docs for re.sub.


    Note: If you are referring to a match group, such as in a lambda in the replacement or as the result of re.search, you can also use .group(0) for the same function:

    s="abc123efg456hij"
    re.sub(r"[a-z](?!$)",lambda m: rf"{m.group(0)}\t",s)
    # Python 3.9+ you can use m[0] instead of m.group(0)
    

    Result:

    a\tb\tc\t123e\tf\tg\t456h\ti\tj
    

    Here is an example of using re.Match Object from re.search (or other re method that produces a match object):

    >>> s='abc123'
    >>> m=re.search(r'\d', s)
    >>> m[0]                   # what matched? $0 in PCRE
    '1'
    >>> m.span()               # Where? 
    (3, 4)
    >>> m.re                   # With what regex?
    re.compile('\\d')          
    

    If you want to see what re.sub would use as a string result, you can use match.expand:

    >>> m.expand(r"\g<0>\n")
    '1\n'