Regex with m flag in Perl vs. Python

I'm trying to automatically translate some simple Perl code with a regex to Python, and I'm having an issue. Here is the Perl code:

$stamp='[stamp]';
$message = "message\n";
$message =~ s/^/$stamp/gm;
print "$message";
[stamp]message

Here is my Python equivalent:

>>> import re
>>> re.sub(re.compile("^", re.M), "[stamp]", "message\n", count=0)
'[stamp]message\n[stamp]'

Note the answer is different (it has an extra [stamp] at the end). How do I generate code that has the same behavior for the regex?

Solution

Perl and Python's regex engines differ slightly on the definition of a "line"; Perl does not consider the empty string following a trailing newline in the input string to be a line, Python does.

Best solution I can come up with is to change "^" to r"^(?=.|\n)" (note r prefix on string to make it a raw literal; all regex should use raw literals). You can also simplify a bit by just calling methods on the compiled regex or call re.sub with the uncompiled pattern, and since count=0 is already the default, you can omit it. Thus, the final code would be either:

re.compile(r"^(?=.|\n)", re.M).sub("[stamp]", "message\n")

or:

re.sub(r"^(?=.|\n)", "[stamp]", "message\n", flags=re.M)

Even better would be:

start_of_line = re.compile(r"^(?=.|\n)", re.M)  # Done once up front

start_of_line.sub("[stamp]", "message\n")  # Done on demand

avoiding recompiling/rechecking compiled regex cache each time, by creating the compiled regex just once and reusing it.

Alternative solutions:

Split up the lines in a way that will match Perl's definition of a line, then use the non-re.MULTILINE version of the regex per line, then shove them back together, e.g.:

start_of_line = re.compile(r"^")  # Compile once up front without re.M

# Split lines, keeping ends, in a way that matches Perl's definition of a line
# then substitute on line-by-line basis
''.join([start_of_line.sub("[stamp]", line) for line in "message\n".splitlines(keepends=True)])

Strip a single trailing newline, if it exists, up-front, perform regex substitution, add back newline (if applicable):

message = '...'
if message.endswith('\n'):
    result = start_of_line.sub("[stamp]", message[:-1]) + '\n'
else:
    result = start_of_line.sub("[stamp]", message)

Neither option is as succinct/efficient as trying to tweak the regex, but if arbitrary user-supplied regex must be handled, there's always going to be a corner case, and pre-processing to something that removes the Perl/Python incompatibility is a lot safer.