I have a tab-delimited table that looks like this:
chr1 100 110 + 2 3 0 8 6
chr1 150 200 + 1 4 0 2 0
chr1 200 220 + 1 4 2 0 0
chr1 250 260 + 4 2 6 1 3
I would like to count how many zeros are in columns 5-9 and add that number to column 10:
chr1 100 110 + 2 3 0 8 6 1
chr1 150 200 + 1 4 0 2 0 2
chr1 200 220 + 1 4 2 0 0 2
chr1 250 260 + 4 2 6 1 3 0
Ultimately, the goal is to subset only those rows with no more than 4 zeros (at least 2 columns being non-zero). I know how to do this subset with awk
but I don't know how to count the zeros in those columns. If there is a simpler way to just require that at least two columns be non-zero between columns 5-9 that would be ideal.
rethab's answer perfectly answers your first requirement of adding an extra column. This answers your second requirement (print only lines with less than 4 zeros). With awk (tested with GNU awk), simply count the non-zero fields between field 5 et field 9 (variable nz
), and print only if it is greater or equal 2:
$ cat foo.txt
chr1 100 110 + 2 3 0 8 6
chr1 150 200 + 1 4 0 2 0
chr1 250 260 + 0 0 0 1 0
chr1 200 220 + 1 4 2 0 0
chr1 250 260 + 4 2 6 1 3
$ awk '{nz=0; for(i=5;i<=9;i++) nz+=($i!=0)} nz>=2' foo.txt
chr1 100 110 + 2 3 0 8 6
chr1 150 200 + 1 4 0 2 0
chr1 200 220 + 1 4 2 0 0
chr1 250 260 + 4 2 6 1 3