I'm studying regular expressions from Mastering Regular Expressions, 3rd Edition, and I've come across the statement that $
is a bit more complex than ^
, which surprised me as I thought they were "symmetrical", except when they are escaped to mean their literal counterparts.
In fact, at page 129, their description is slightly different, with more words spent in favour of $
; however I'm still confused about it.
^
, only two clear alternatives are described:Caret
^
matches at the beginning of the text being searched, and, if in an enhaced line-anchor mode, after any newline. [...]$
[...] matches
$
, the description is more obscure to me:
$
[...] matches at the end of the target string, and before a string-ending newline, as well. The latter is common, to allow an expression likes$
(ostensibly, to match "a line ending withs
") to match…s<NL>
, a line ending withs
that's capped with an ending newline.Two other common meanings for
$
are to match only at the end of the target text, and to match before any newline.
The latter two meanings seem pretty symmetric to those described for ^
, but what about the string-ending newline meaning?
Searching for [regex] "string-ending newline"
only gives one, two, and three results, at the moment, and all of them refer to
$
Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
Zero width assertion $
asserts position at the end of the string, or before the line terminator right at the end of the string (if any).
It will be more clear with these code snippets in perl
:
$str = 'abc
foo';
$str =~ s/\w+$/#/;
print "1. <" . $str . ">\n\n";
$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "2. <" . $str . ">\n\n";
$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "3. <" . $str . ">\n\n";
This will generate this output:
1. <abc
#>
2. <abc
#
>
3. <abc
foo
>
As you can see that $
matches cases 1
and 2
because $
matches at the end of string (case 1) or before the line break right at the end (case 2). However case 3 remains unmatched because line break is not at the end of string.