Search code examples
bashawkgrepaverage

How to solve the bash grep error to get the average of rows


I have a data set which looks like this, I am trying to compare first two columns and if the value of first two columns are the same for two lines I am trying to get the average of other 5 columns in that rows and add it to one row. If first two columns are not the same it will output the row itself.

15.0 -100.0 409.27 5.7103 0.0106 0.0062 652.04
15.0 -99.5 409.14 5.6868 0.0109 0.0065 652.02
15.0 -99.0 409.23 5.4866 0.0108 0.0063 651.47
15.0 -98.5 409.19 5.4588 0.0107 0.0063 651.34
15.0 -98.5 409.17 5.3488 0.0105 0.0062 651.30
15.0 -98.0 409.16 5.2951 0.0104 0.0061 651.24
15.0 -97.5 409.17 5.2647 0.0104 0.0061 651.22
15.0 -97.0 409.27 5.0288 0.0098 0.0056 650.60

I am using bash and I wrote this code

sort -k 1 -n -k 2 -n file1 > temp
awk ‘{print $1, $2}’ temp | uniq > file2
while read line
do
    n=`grep -c "$line" temp`
    #echo $n
    grep -E "$line" temp > temp1
        awk -v n=$n '{sum+=$3} END {print sum/n}' temp1 > val1;
        awk -v n=$n '{sum+=$4} END {print sum/n}' temp1 > val2;
        awk -v n=$n '{sum+=$5} END {print sum/n}' temp1 > val3;
        awk -v n=$n '{sum+=$6} END {print sum/n}' temp1 > val4;
        awk -v n=$n '{sum+=$7} END {print sum/n}' temp1 > val5;
    
     echo $line $val1 $val2 $val3 $val4 $val5 >> Averages.dat;
done < file2

I keep getting this error. It looks like this line throws the error.

 n=`grep -c "$line" temp`

grep: invalid option -- '.' Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information.

Can someone please guide me with this? Thank you in advance.


Solution

  • - is the character used to indicate an option to grep.

    grep allows multiple options to be grouped after a single -. (See Guideline 5: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02)

    The GNU version of grep allows -NUM as an alias for --context=NUM

    Your data contains at least one line whose first field is a negative number.

    You can indicate to grep that it should treat an argument as a pattern by preceding it with -e.

    Consider:

    $ seq -5 5 | grep -c 3
    2
    $ seq -5 5 | grep -c -3
    Usage: grep [OPTION]... PATTERNS [FILE]...
    Try 'grep --help' for more information.
    $ seq -5 5 | grep -c -3.0
    grep: invalid option -- '.'
    Usage: grep [OPTION]... PATTERNS [FILE]...
    Try 'grep --help' for more information.
    $ seq -5 5 | grep -c -e -3.0
    0
    $
    

    Note that your code is quite inefficient.

    Unlesss your data is huge you can use a single awk call to do all the processing. Perhaps:

    awk '
        {
            k = $1 OFS $2
    
            n[k]++
    
            s3[k] += $3
            s4[k] += $4
            s5[k] += $5
            s6[k] += $6
            s7[k] += $7
        }
        END {
            for (k in n)
                print k,
                    s3[k]/n[k],
                    s4[k]/n[k],
                    s5[k]/n[k],
                    s6[k]/n[k],
                    s7[k]/n[k]
        }
    ' file1 >Averages.dat
    

    If the input is huge, you may need to sort first and use a processing procedure that tracks when the key changes so that you don't need to store everything in memory. Perhaps:

    sort -n -k1,1 -k2,2 file1 | awk '
        function p() { print pk, s3/n, s4/n, s5/n, s6/n, s7/n }
        {
            k = $1 OFS $2
    
            if (k!=pk) {
                if (NR>1) p()
                n = s3 = s4 = s5 = s6 = s7 = 0
            }
    
            n++
            s3 += $3
            s4 += $4
            s5 += $5
            s6 += $6
            s7 += $7
    
            pk = k
        }
        END { p() }
    ' > Averages.dat