I was just writing a program where I wanted to insert a newline after a specific pattern. The idea was to match the pattern and replace with the overall match (i.e. capture group \0
) and \n
.
s = "abc"
insert_newline_pattern = re.compile(r"b")
re.sub(insert_newline_pattern, r"\0\n", s)
However the output is a\x00\nc
, reading \0
as a null character.
I know that I can "simply" rewrite this as:
s = "abc"
insert_newline_pattern = re.compile(r"(b)")
re.sub(insert_newline_pattern, r"\1\n", s)
which outputs the desired ab\nc
with the idea of wrapping the overall match into group \1
and substituting this. See also a Python regex101 demo.
Is there a way to access the overall match in any way, similar to this PCRE regex101 demo in Python?
You can use the form \g<0>
in Python for the zeroeth group (or overall match from the pattern) which would be the same as $0
in PCRE (alternatively, in PCRE, you can use $&
or \0
in replacement strings).
s="abc"
insert_newline_pattern=re.compile(r"b")
re.sub(insert_newline_pattern,r"\g<0>\n",s)
Result:
'ab\nc'
This form is to avoid the potential ambiguity of \10
used in PCRE. Is that the tenth backreference or the first followed by a literal '0'
?
It is documented under the docs for re.sub.
Note: If you are referring to a match group, such as in a lambda
in the replacement or as the result of re.search
, you can also use .group(0)
for the same function:
s="abc123efg456hij"
re.sub(r"[a-z](?!$)",lambda m: rf"{m.group(0)}\t",s)
# Python 3.9+ you can use m[0] instead of m.group(0)
Result:
a\tb\tc\t123e\tf\tg\t456h\ti\tj
Here is an example of using re.Match Object from re.search
(or other re method that produces a match object):
>>> s='abc123'
>>> m=re.search(r'\d', s)
>>> m[0] # what matched? $0 in PCRE
'1'
>>> m.span() # Where?
(3, 4)
>>> m.re # With what regex?
re.compile('\\d')
If you want to see what re.sub
would use as a string result, you can use match.expand:
>>> m.expand(r"\g<0>\n")
'1\n'