Search code examples
linuxshellcommand

How to count the number of row in multiple files based on condition in Linux


I'm trying to count row based on condition through multiple files then summary.Condition is value greater than 60.

> sample1.txt

name  value
A001     27
A002     54
A003     39
A004     81
A005     88


> sample2.txt

name  value
B001     46
B002     92
B003     79
B004     67
B005     66

> sample3.txt

name  value
C001     12
C002     39
C003     83
C004     79
C005     27

Desired output:

   file  Count
sample1      2
sample2      4
sample3      2

I have tried :

awk '$2>60{c++} END{print c+0}' sample1.txt

but this code will count header also, and I'm stucking at how to summary all files.


Solution

  • Initially awk doesn't know if $2 is a number or a string.

    After that you only have one counter c, this will lead to having only one count at the end of your script. To avoid that you can use an array using the FILENAME as index to count every value of each file.

    script.awk :

    !(FILENAME in a){a[FILENAME]=0} 
    $2+0>60{a[FILENAME]++} 
    END{
        print "file", "count"
        for(key in a) print key, a[key]
    }
    

    The line !(FILENAME in a){a[FILENAME]=0} is useful if you have a file that never match your condition, as otherwise these would not show up in the summary

    If you need a total you can add a counter in the first part of the script or in the for loop at the end. In these examples below c will equal.

    !(FILENAME in a){a[FILENAME]=0}
    $2+0>60{a[FILENAME]++} 
    END{
        print "file", "count"
        for(key in a){
            print key, a[key]
            c+=a[key]
        }
        print "total", c
    }
    

    or

    !(FILENAME in a){a[FILENAME]=0}
    $2+0>60{a[FILENAME]++; c++} 
    END{
        print "file", "count"
        for(key in a) print key, a[key]
        print "total", c
    }
    

    You can use these as script :

    awk -f script.awk sample1.txt sample2.txt sample3.txt