Search code examples
bashawkgrepthreshold

AWK Threshold Greater Than


I have text files in the folder which look something like:

[13]pkt_size=140
[31]pkt_size=139
[49]pkt_size=139
[67]pkt_size=140
[85]pkt_size=139
[103]pkt_size=139
[121]pkt_size=140
[139]pkt_size=139
[157]pkt_size=139
[175]pkt_size=140
[193]pkt_size=139
[211]pkt_size=139
[229]pkt_size=3660
[253]pkt_size=140
[271]pkt_size=139
[289]pkt_size=139
[307]pkt_size=5164
[331]pkt_size=140
[349]pkt_size=139
[367]pkt_size=139
[385]pkt_size=7512

I want to set threshold=1000, then I want script to sum every 10 lines in the file , then if the sum is > threshold then print the output.

But I want to run that script for folder and script must create individual file of output.


Solution

  • This script would process the sum as every 10 lines and print the result if over 1000:

    $ cat sum.awk 
    BEGIN {
        FS = "="
    }
    { acc += $2 }
    (NR % 10) == 0 { if (acc > 1000) { print acc } acc = 0; }
    $ awk -f sum.awk yourfile.txt 
    1394
    9938
    $ 
    

    If you want the 1000 threshold to be a parameter, I let you choose how to pass paremeters to awk. For instance you can use the -v var=val in the command line as described here: https://www.gnu.org/software/gawk/manual/gawk.html#Options

    About running the command for every file and produce an output file, here xargs comes to the rescue. See this sample here:

    $ ls
    sum.awk  yourfile.txt  zzzzzzz.txt
    $ ls *.txt
    yourfile.txt  zzzzzzz.txt
    $ ls *.txt | xargs -L 1 -I {} /bin/bash -c 'awk -f sum.awk {} > {}.output'
    $ ls
    sum.awk  yourfile.txt  yourfile.txt.output  zzzzzzz.txt  zzzzzzz.txt.output
    $ 
    

    xargs will run the command for every line in the input. By default it will try to group several lines in each execution, but we will prevent that with the -L 1 setting.

    Next we use the -I {} argument to declare a placeholder string {} that will be the each line (the filename).

    Finally: execute the /bin/bash -c '<what to execute>' to run the awk script on our file and redirect the output.

    Hope it helps.