Is it possible to count the number of lines in each column of a file? For example, I've been trying to use awk to separate columns on the semi-colon symbol, specify each column individually and us wc command to count any and all occurrences within that column.
For the below command I am trying to find the number of items in column 3 without counting blank lines. Unfortunately, this command just counts the entire file. I could move the column to a different file and count that file but I just want to know if there is a much quicker way of going about this?
awk -F ';' '{print $3}' file.txt | wc -l
Data file format
; 1 ; 2 ; 3 ; 4 ; 5 ; 6 ;
; 3 ; 4 ; 5 ; 6 ; ; 4 ;
; ; 3 ; 5 ; 6 ; 9 ; 8 ;
; 1 ; 6 ; 3 ; ; ; 4 ;
; 2 ; 3 ; ; 3 ; ; 5 ;
Example output wanted
Column 1 = 4 aka(1 + 3 + 1 + 2)
Column 2 = 5
Column 3 = 4
Colunm 4 = 4
Column 5 = 2
Column 6 = 5
Keep separate counts for each field using an array, then print the totals when you're done:
$ awk -F' *; *' '{ for (i = 2; i < NF; ++i) if ($i != "") ++count[i] }
END { for (i = 2; i < NF; ++i) print "Column", i-1, "=", count[i] }' file
Column 1 = 4
Column 2 = 5
Column 3 = 4
Column 4 = 4
Column 5 = 2
Column 6 = 5
if ($i)
but this would fail for a column containing a 0
.END
block, offsetting by -1
to start from 1
instead of 2
.One assumption made here is that the number of columns in each line is uniform throughout the file, so that NF
from the last line can safely be used in the END
block.
A slight variation, using a simpler field separator:
$ awk -F';' '{ for (i = 2; i < NF; ++i) count[i] += ($i ~ /[^ ]/) }
END { for (i = 2; i < NF; ++i) print "Column", i-1, "=", count[i] }' file
$i ~ /[^ ]/
is equal to 1
if any non-space characters exist in the i
th field, 0
otherwise.