Take this simple example in perl v5.22.0:
my $data = "foobar\n";
$data =~ s/(?<!bar)(\s*)$/qux$1/;
print $data;
It prints:
foobar
qux
but I didn't expect $data
to change. I also tried some earlier versions of perl 5.x with the same result.
Conversely, I'd expect this string with the same regex to cause a replacement but it doesn't:
my $data = "foobaz\n";
$data =~ s/(?<!bar)(\s*)$/qux$1/;
print $data;
I don't understand why this happens. In either one the asterisk is supposed to be greedy. I figured $1
would be \n
making the negative look-behind group compare against bar
in the first example and baz
in the second example. Regex101 when I use perl says:
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed.
So in this case is what happens is it gives back to the negative look-behind?
As the title says the real issue is I'd like to stop the look-behind from swallowing that second group. Unfortunately it's not a single letter, that is just for the example to make it easier to understand. Also in perl I'm somewhat limited with what I can do with the negative look-behind, for example "Variable length lookbehind not implemented in regex". If it's possible I'd like an answer that is compatible with perl 5.8. Thanks
I think you want
$data =~ s/(?<!bar)(?<!\s)(\s*)$/qux$1/;
The following version will work with 5.8, and I think it's actually faster (since it jumps to the end of the string and backtracks rather than checking two look behinds at every position):
$data =~ s/
^
(
(?:
.*
(?: [^r\s]
| [^a] r
| [^b] ar
)
)?
)
( \s* )
\z
/${1}qux$2/sx;
($
could be used instead of \z
; it's just a micro-optimization.)
Without the m
flag, $
is equivalent to (?:\n?\z)
, which it to say it matches at a newline at the end the string and at the end of the string. This means there are two possible places for $
to match foobar␊
foobar␊ (There's a LF at position 6 in
01234567 case your font can't show it.)
^^
(?<!bar)
prevents the first location from being considered, but it allows the second.
(?<!bar)(\s*)$
matches 0 characters at position 7, because
(?<=bar)
matches 0 characters at position 7.(\s*)
matches 0 characters at position 7.$
matches 0 characters at position 7.It's the only possible match, so greediness is not relevant.