Search code examples
regexperlend-of-line

How does Perl regexp anchor $ actually handle a trailing newline?


I recently discovered some unexpected behaviour for the end-of-string anchor $ in a Perl Regular Expression (Perl 5.26.1 x86_64 on OpenSuse 15.2).

Supposedly, the $ refers to the end of the string, not the end of a line as it does in grep(1). Hence an explicit \n at the end of a string should have to be matched explicitly. However, the following (complete) program:

my @strings = ( 
  "hello world",
  "hello world\n",
  "hello world\t"
);
my $i = 0;
foreach (@strings) {
  $i++;
  print "$i: >>$_<<\n" if /d$/;
}

produces this output:

1: >>hello world<<
2: >>hello world
<<

i.e., the /d$/ matches not only the first of the three strings but also the second with its trailing newline. On the other hand, as expected, the regexp /d\n$/ matches the second string only, and /d\s$/ matches the second and third.

What's going on here?


Solution

  • perlre states for the $ metacharacter:

    Match the end of the string
    (or before newline at the end of the string;

    This means that d followed immediately by \n (newline) will match the regex.