Search code examples
awkprintingwc

Counting the number of lines in each column


Is it possible to count the number of lines in each column of a file? For example, I've been trying to use awk to separate columns on the semi-colon symbol, specify each column individually and us wc command to count any and all occurrences within that column.
For the below command I am trying to find the number of items in column 3 without counting blank lines. Unfortunately, this command just counts the entire file. I could move the column to a different file and count that file but I just want to know if there is a much quicker way of going about this?

awk -F ';' '{print $3}' file.txt | wc -l

Data file format

; 1 ; 2 ; 3 ; 4 ; 5 ; 6 ;  
; 3 ; 4 ; 5 ; 6 ;   ; 4 ;  
;   ; 3 ; 5 ; 6 ; 9 ; 8 ;  
; 1 ; 6 ; 3 ;   ;   ; 4 ;  
; 2 ; 3 ;   ; 3 ;   ; 5 ;  

Example output wanted

Column 1 = 4 aka(1 + 3 + 1 + 2)  
Column 2 = 5  
Column 3 = 4  
Colunm 4 = 4  
Column 5 = 2  
Column 6 = 5 

Solution

  • Keep separate counts for each field using an array, then print the totals when you're done:

    $ awk -F' *; *' '{ for (i = 2; i < NF; ++i) if ($i != "") ++count[i] } 
      END { for (i = 2; i < NF; ++i) print "Column", i-1, "=", count[i] }' file
    Column 1 = 4
    Column 2 = 5
    Column 3 = 4
    Column 4 = 4
    Column 5 = 2
    Column 6 = 5
    
    • Set the field separator to consume the semicolons as well as any surrounding spaces.
    • Loop through each field (except the first and last ones, which are always empty) and increment a counter for non-empty fields.
      • it would be tempting to use if ($i) but this would fail for a column containing a 0.
    • Print the counts in the END block, offsetting by -1 to start from 1 instead of 2.

    One assumption made here is that the number of columns in each line is uniform throughout the file, so that NF from the last line can safely be used in the END block.


    A slight variation, using a simpler field separator:

    $ awk -F';' '{ for (i = 2; i < NF; ++i) count[i] += ($i ~ /[^ ]/) } 
      END { for (i = 2; i < NF; ++i) print "Column", i-1, "=", count[i] }' file
    

    $i ~ /[^ ]/ is equal to 1 if any non-space characters exist in the ith field, 0 otherwise.