Search code examples
bashgitcarriage-return

How to ignore '\r' while counting characters of several files with wc -m


I try to count characters that where submitted to git by the author "JohnJohnson" using this command:

wc -m $(git log --use-mailmap --no-merges --author="JohnJohnson" --name-only --pretty=format:"" | sort -u)

The problem is that on Linux and Windows(git-bash) it produces different results, at least because on Windows the new line consists of two chars '\r\n'. Is there a way to make wc -m to ignore '\r' so I get consistent results on both OSs with the same command?


Solution

  • NOTE: While running dos2unix on each file before running wc -m should suffice, I'm assuming a) dos2unix is not available and/or b) OP may find there are other characters (besides \r) that need to be removed.


    Assuming the objective is to generate the same exact output as wc -m, one idea using a user-defined function:

    my_wc () {
        local charcount=0 totcount=0
    
        for fname in $@
        do
            charcount=$(tr -d '\r' < $fname | wc -m)
            echo "$charcount $fname" 
            ((totcount+=charcount))
        done
    
        echo "$totcount total"
    }
    

    Applying to OP's example:

    my_wc $(git log --use-mailmap --no-merges --author="JohnJohnson" --name-only --pretty=format:"" | sort -u)
    

    If OP finds additional characters (besides \r) to skip then add them to the tr -d '\r' call).


    Another function idea but this one uses awk:

    my_wc() {
        awk 'BEGIN { RS="^$" }                  # whole file becomes one single, long record
                   { gsub("\r","")
                     n=length($0)
                     tot+=n
                     print n,FILENAME
                   }
             END   { print tot,"total"}' $@
    }
    

    Demonstrating these functions on a few sample files:

    $ head f?
    ==> f1 <==
    a       13
    a       5
    b       7
    a       20
    a       3
    
    ==> f2 <==
    a       13
    a       5
    b       7
    a       20
    a       3
    
    ==> f3 <==
    a       13
    a       5
    b       7
    a       20
    a       3
    
    $ dos2unix f?
    
    $ wc -m f?
    22 f1
    22 f2
    22 f3
    66 total
    
    $ unix2dos f?
    
    $ wc -m f?
    27 f1
    27 f2
    27 f3
    81 total
    
    $ my_wc f?
    22 f1
    22 f2
    22 f3
    66 total