Search code examples
bashshellcygwingrepnon-printable

Grepping non printable characters by Cygwin


Grepping non printable characters doesn't seem to work for carriage return (control key ^M).

usr@R923047 ~
$ head -3 test.ctl
row 1
row 2
row 3
usr@R923047 ~
$ head -3 test.ctl | cat -nv
     1  row 1^M
     2  row 2^M
     3  row 3
usr@R923047 ~
$ head -3 test.ctl | grep '[^[:print:]]'

usr@R923047 ~
$ head -3 test.ctl | grep '[[:cntrl:]]'

usr@R923047 ~

Solution

  • According to the grep man-page, you can specify -U or --binary to:

    Treat the file(s) as binary. By default, under MS-DOS and MS-Windows, grep guesses the file type by looking at the contents of the first 32KB read from the file. If grep decides the file is a text file, it strips the CR characters from the original file contents (to make regular expressions with ^ and $ work correctly). Specifying -U overrules this guesswork, causing all files to be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs at the end of each line, this will cause some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows.

    So:

    $ head -3 test.ctl
    row 1
    row 2
    row 3
    $ head -3 test.ctl | cat -nv
         1  row 1^M
         2  row 2^M
         3  row 3
    $ head -3 test.ctl | grep '[^[:print:]]'
    
    $ head -3 test.ctl | grep '[[:cntrl:]]'
    
    $ head -3 test.ctl | grep -U '[^[:print:]]'
    row 1
    row 2
    
    $ head -3 test.ctl | grep -U '[[:cntrl:]]'
    row 1
    row 2