I'm working on text files with Windows line terminators (\r\n
), on Linux with Perl v5.30.
Something that I don't understand is why, with these text files, capturing groups don't match characters, while the regular expression matches.
Example:
$ echo $'Line1\r\nLine2\n' | perl -ne 'print /(.*)/'
Line2
$ echo $'Line1\r\nLine2\n' | perl -ne '/(.*)/ && print "match\n"'
match
match
match
Nothing from the first line is captured, but all the (three) lines are matched.
Why is it so?
Use cat -v
or xxd
to see what the output really contains.
$ echo $'Line1\r\nLine2\n' | perl -ne 'print /(.*)/' | cat -v
Line1^MLine2
^M
corresponds to \r
, it moves the cursor back to the beginning of the line, so the second match overwrites the first one.
This explains two matches, but where's the third one? Add something to separate the matches:
$ echo $'Line1\r\nLine2\n' | perl -ne 'print /(.*)/, "|"' | cat -v
Line1^M|Line2||
echo
adds a newline to its output, so the last line is empty, but it still matches .*
.