I try to count characters that where submitted to git by the author "JohnJohnson" using this command:
wc -m $(git log --use-mailmap --no-merges --author="JohnJohnson" --name-only --pretty=format:"" | sort -u)
The problem is that on Linux and Windows(git-bash) it produces different results, at least because on Windows the new line consists of two chars '\r\n'. Is there a way to make wc -m to ignore '\r' so I get consistent results on both OSs with the same command?
NOTE: While running dos2unix
on each file before running wc -m
should suffice, I'm assuming a) dos2unix
is not available and/or b) OP may find there are other characters (besides \r
) that need to be removed.
Assuming the objective is to generate the same exact output as wc -m
, one idea using a user-defined function:
my_wc () {
local charcount=0 totcount=0
for fname in $@
do
charcount=$(tr -d '\r' < $fname | wc -m)
echo "$charcount $fname"
((totcount+=charcount))
done
echo "$totcount total"
}
Applying to OP's example:
my_wc $(git log --use-mailmap --no-merges --author="JohnJohnson" --name-only --pretty=format:"" | sort -u)
If OP finds additional characters (besides \r
) to skip then add them to the tr -d '\r'
call).
Another function idea but this one uses awk
:
my_wc() {
awk 'BEGIN { RS="^$" } # whole file becomes one single, long record
{ gsub("\r","")
n=length($0)
tot+=n
print n,FILENAME
}
END { print tot,"total"}' $@
}
Demonstrating these functions on a few sample files:
$ head f?
==> f1 <==
a 13
a 5
b 7
a 20
a 3
==> f2 <==
a 13
a 5
b 7
a 20
a 3
==> f3 <==
a 13
a 5
b 7
a 20
a 3
$ dos2unix f?
$ wc -m f?
22 f1
22 f2
22 f3
66 total
$ unix2dos f?
$ wc -m f?
27 f1
27 f2
27 f3
81 total
$ my_wc f?
22 f1
22 f2
22 f3
66 total