Search code examples
regexbashgrepwindows-10mingw-w64

grep for x20-x7E regex hex range not working in Windows10 MINGW64 bash


My file test.csv

Col1,Col2,Col3,Col4
1,AAA,1,
2,BBB,0,
3,CCCÆ,,ttt
4,DDD,1,
5,EEE,0,

Expected output:

3,CCCÆ,,ttt

Tried:

grep -a "[^\x20-\x7e]+" test.csv
grep -a '[^\x20-\x7e]+' test.csv
grep "[^\x20-\x7e]+" test.csv
grep '[^\x20-\x7e]+' test.csv

also tried the flags -P and -E but all do not return me the result I want. In Powershell, I did

Select-String -Pattern '[^\x20-\x7E]+' test.csv

and it returned me the expected result.

Could someone point me in the right direction for MINGW64 bash grep (GNU grep) 3.1 on Windows10? It is installed via git download for windows here: https://git-scm.com/download/win


Solution

  • It appears the POSIX BRE and ERE syntax in grep for Windows do not support \xXX notation.

    You may use -P option to enable the PCRE regex engine and then use

    grep -P "[^\x{00}-\x{7E}]" file
    

    Or,

    grep -P "[^[:ascii:]]" file
    

    to find any line containing a non-ASCII character.

    NOTE that you cannot use [^\x20-\x7E] range because the CR (part of the line ending in Windows text files) will get matched, and all lines but the last (if it is not followed with trailing line break(s)) will get matched. You may add CR symbol though to the negated character class and use grep -P "[^\x{0D}\x{20}-\x{7E}]" file though.