table
chr10 10482 10484 0 11 + CA
chr10 10486 10488 0 12 + ca
chr10 10487 10489 0 13 + Ca
chr10 10490 10492 0 13 + cA
chr10 10491 10493 0 12 + CT
chr10 10494 10496 6.66667 15 + ca
chr10 10495 10497 6.66667 15 + cc
I would like the count the number of lines in column 7 where "CA" can be found regardless of the any of the two letters being in upper or lower case.
The desired output would be 5.
The two commands (below) give an empty output
cat table | awk ' $7 ==/^[Cc][Aa]/{++count} END {print count}'
awk 'BEGIN {IGNORECASE = 1} $7==/"CA"/ {++count} END {print count}' table
The below command returns a value of 1
awk 'BEGIN {IGNORECASE = 1} END {if ($7=="CA"){++count} {print count}}' table
Note: my actual table is tens of millions of lines long, thus I do not want to write a table as an intermediate in order to count. (I need to repeat this task for other files too).
There is a little problem in your syntax: you either say var == "string"
or var ~ regexp
, but you are saying var ~ /"string"/
. Using the correct combination makes your command work:
$ awk '$7 ~ /^[Cc][Aa]/{++count} END {print count+0}' file
5
$ awk 'BEGIN {IGNORECASE = 1} $7=="CA" {++count} END {print count+0}' file
5
Also, you may want to use toupper()
(or tolower()
) to check this, instead of using the IGNORECASE
flag:
awk 'toupper($7) == "CA" {++count} END {print count+0}' file
Note the trick to print count + 0
instead of just count
. This way, we cast the variable to 0
if it wasn't set before. With this, it will print 0
whenever there was no matches; if we would just print count
, it would return an empty string.