Search code examples
bashawkpipe

awk on multiple files and piping output of each of its run to the wc command separately


I have bunch of record wise formatted (.csv)files. First field is an integer or may be empty as well. Its true for all the files. I want to count number of records whose first field is empty in each file and then want to plot count graph over all the files.

File format of filename.csv:

123456,few,other,fields
,few,other,fields 
234567,few,other,fields

I want something like

awk -F, '$1==""' `ls` | (for each file separately  wc -l) | gnugraph ( y axis as output of wc -l command and x axis as simply 1 to n where n is number of csv files)

The problem I am facing is wc -l gets executed only once for all the files together. I want to run wc -l for each file and count the number of records having empty first field and provide this sequence of count to the gnugraph command. once I get required count for each file I am almost done as

seq 10 | gnuplot -p -e "plot '<cat'"

works fine


Solution

  • You could use awk to keep track of the count for each file in an array. Then at the end print the contents of the array:

      awk '$1==""{a[FILENAME]+=1} END{for(file in a) { print file, a[file] }}' `ls`
    

    This way you don't have to tangle with wc and just shoot the contents right over to gnuplot

    Example in use:

    $> cat file1
    ,test
    2,test
    3,
    $> cat file2
    ,test
    2,test
    3,
    ,test
    $> awk -F"," '$1==""{a[FILENAME]+=1} END{for(file in a) { print file, a[file] }}' `ls`
    file1 1
    file2 2