Using re.sub and replace with overall match

I was just writing a program where I wanted to insert a newline after a specific pattern. The idea was to match the pattern and replace with the overall match (i.e. capture group \0) and \n.

s = "abc"
insert_newline_pattern = re.compile(r"b")
re.sub(insert_newline_pattern, r"\0\n", s)

However the output is a\x00\nc, reading \0 as a null character.

I know that I can "simply" rewrite this as:

s = "abc"
insert_newline_pattern = re.compile(r"(b)")
re.sub(insert_newline_pattern, r"\1\n", s)

which outputs the desired ab\nc with the idea of wrapping the overall match into group \1 and substituting this. See also a Python regex101 demo.

Is there a way to access the overall match in any way, similar to this PCRE regex101 demo in Python?

Solution

You can use the form \g<0> in Python for the zeroeth group (or overall match from the pattern) which would be the same as $0 in PCRE (alternatively, in PCRE, you can use $& or \0 in replacement strings).

s="abc"
insert_newline_pattern=re.compile(r"b")
re.sub(insert_newline_pattern,r"\g<0>\n",s)

Result:

'ab\nc'

This form is to avoid the potential ambiguity of \10 used in PCRE. Is that the tenth backreference or the first followed by a literal '0'?

It is documented under the docs for re.sub.

Note: If you are referring to a match group, such as in a lambda in the replacement or as the result of re.search, you can also use .group(0) for the same function:

s="abc123efg456hij"
re.sub(r"[a-z](?!$)",lambda m: rf"{m.group(0)}\t",s)
# Python 3.9+ you can use m[0] instead of m.group(0)

Result:

a\tb\tc\t123e\tf\tg\t456h\ti\tj

Here is an example of using re.Match Object from re.search (or other re method that produces a match object):

>>> s='abc123'
>>> m=re.search(r'\d', s)
>>> m[0]                   # what matched? $0 in PCRE
'1'
>>> m.span()               # Where? 
(3, 4)
>>> m.re                   # With what regex?
re.compile('\\d')

If you want to see what re.sub would use as a string result, you can use match.expand:

>>> m.expand(r"\g<0>\n")
'1\n'