Search code examples
bashgrepcarriage-return

How does grep handle DOS end of line?


I have a Windows text file which contains a line (with ending CRLF)

aline

The following is several commands' output:

[root@panel ~]# grep aline file.txt
aline
[root@panel ~]# grep aline$'\r' file.txt

[root@panel ~]# grep aline$'\r'$'\n' file.txt

[root@panel ~]# grep aline$'\n' file.txt
aline

The first command's output is normal. I'm curious about the second and the third output. Why is it an empty line? And the last output, I think it can not find the string but it actually finds it, why? The commands are run on CentOS/bash.


Solution

  • In this case grep really matches the string "aline\r" but you just don't see it because it was overwritten by the ANSI sequence that prints color. Pass the output to od -c and you'll see

    $ grep aline file.txt
    aline
    $ grep aline$'\r' file.txt
    
    $ grep aline$'\r' --color=never file.txt
    aline
    $ grep aline$'\r' --color=never file.txt | od -c
    0000000   a   l   i   n   e  \r  \n
    0000007
    $ grep aline$'\r' --color=always file.txt | od -c
    0000000 033   [   0   1   ;   3   1   m 033   [   K   a   l   i   n   e
    0000020  \r 033   [   m 033   [   K  \n
    0000030
    

    With --color=never you can see the output string because grep doesn't print out the color. \r simply resets the cursor to the start of the line and then a new line is printed out, nothing is overwritten. But by default grep will check whether it's running on the terminal or its output is being piped and prints out the matched string in color if supported, and it seems resetting the color then print \n clears the rest of the line

    To match \n you can use the -z option to make null bytes the line separator

    $ grep -z aline$'\r'$'\n' --color=never file.txt
    aline
    $ grep -z aline$'\r'$'\n' --color=never file.txt  | od -c
    0000000   a   l   i   n   e  \r  \n  \0
    0000010
    $ grep -z aline$'\r'$'\n' --color=always file.txt | od -c
    0000000 033   [   0   1   ;   3   1   m 033   [   K   a   l   i   n   e
    0000020  \r 033   [   m 033   [   K  \n  \0
    0000031
    

    Your last command grep aline$'\n' file.txt works because \n is simply a word separator in bash, so the command is just the same as grep aline file.txt. Exactly the same thing happened in the 3rd line: grep aline$'\r'$'\n' file.txt To pass a newline you must quote the argument to prevent word splitting

    $ echo "aline" | grep -z "aline$(echo $'\n')"
    aline
    

    To demonstrate the effect of the quote with the 3rd line I added another line to the file

    $ cat file.txt
    aline
    another line
    $ grep -z "aline$(echo $'\n')" file.txt | od -c
    0000000   a   l   i   n   e  \r  \n   a   n   o   t   h   e   r       l
    0000020   i   n   e  \n  \0
    0000025
    $ grep -z "aline$(echo $'\n')" file.txt
    aline
    another line
    $