Search code examples
awkmingwmingw-w64

Different awk results on Linux and mingw64 with CRLF line endings


On Linux:

echo -n $'boo\r\nboo\r\n' | awk $'BEGIN { RS="\\n" } {gsub("boo","foo"); print}' | cat -v

returns the expected

foo^M
foo^M

However, on mingw64 (git bash for windows) the same command returns:

foo
foo

without the carriage returns.

I tried setting the record separator explicitly since maybe the default was different between the two platforms, but awk on mingw64 is still chewing up the carriage returns. How can I made awk do the same thing on Linux on mingw64? Note the awk versions are slightly different (GNU Awk 4.0.2 on Linux and GNU Awk 4.2.1 on mingw64), but I wouldn't expect this to matter unless there is some kind of bug.

Note that something is happening specifically in awk since on mingw64 this:

echo -n $'boo\r\nboo\r\n' | cat -v

returns the expected:

boo^M
boo^M

Solution

  • After searching a while, I found this question, And from this answer :

    it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3

    I changed your code to:

    echo -n $'boo\r\nboo\r\n' | awk -v BINMODE=3 $'BEGIN { RS="\\n" } {gsub("boo","foo"); print}' | cat -v
    

    And tried it on Unix, Linux, MacOS, and Windows, all produce this output:

    foo^M
    foo^M
    

    So -v BINMODE=3 is what you are looking for.
    NOTE that only -v BINMODE=3 this switch & before code way working.
    Usually we can pass variable to awk by -v switch, in BEGIN block, or set it after code before files,
    but in this case I tried the three ways, only -v BINMODE=3 works.
    Guess it's something to do with awk's compiling process.

    Example (under cygwin on Windows):

    $ echo -n $'boo\r\nboo\r\n' | awk -v BINMODE=3 '1' | cat -v    
    boo^M                                                          
    boo^M                                                          
    
    $ echo -n $'boo\r\nboo\r\n' | awk 'BEGIN{BINMODE=3}1' | cat -v 
    boo                                                            
    boo                                                            
    
    $ echo -n $'boo\r\nboo\r\n' | awk '1' BINMODE=3 | cat -v       
    boo                                                            
    boo                  
    

    Under other mentioned platforms, they all produce:

    boo^M
    boo^M