I have a data set which looks like this, I am trying to compare first two columns and if the value of first two columns are the same for two lines I am trying to get the average of other 5 columns in that rows and add it to one row. If first two columns are not the same it will output the row itself.
15.0 -100.0 409.27 5.7103 0.0106 0.0062 652.04
15.0 -99.5 409.14 5.6868 0.0109 0.0065 652.02
15.0 -99.0 409.23 5.4866 0.0108 0.0063 651.47
15.0 -98.5 409.19 5.4588 0.0107 0.0063 651.34
15.0 -98.5 409.17 5.3488 0.0105 0.0062 651.30
15.0 -98.0 409.16 5.2951 0.0104 0.0061 651.24
15.0 -97.5 409.17 5.2647 0.0104 0.0061 651.22
15.0 -97.0 409.27 5.0288 0.0098 0.0056 650.60
I am using bash and I wrote this code
sort -k 1 -n -k 2 -n file1 > temp
awk ‘{print $1, $2}’ temp | uniq > file2
while read line
do
n=`grep -c "$line" temp`
#echo $n
grep -E "$line" temp > temp1
awk -v n=$n '{sum+=$3} END {print sum/n}' temp1 > val1;
awk -v n=$n '{sum+=$4} END {print sum/n}' temp1 > val2;
awk -v n=$n '{sum+=$5} END {print sum/n}' temp1 > val3;
awk -v n=$n '{sum+=$6} END {print sum/n}' temp1 > val4;
awk -v n=$n '{sum+=$7} END {print sum/n}' temp1 > val5;
echo $line $val1 $val2 $val3 $val4 $val5 >> Averages.dat;
done < file2
I keep getting this error. It looks like this line throws the error.
n=`grep -c "$line" temp`
grep: invalid option -- '.' Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information.
Can someone please guide me with this? Thank you in advance.
-
is the character used to indicate an option to grep
.
grep
allows multiple options to be grouped after a single -
. (See Guideline 5: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02)
The GNU version of grep
allows -NUM
as an alias for --context=NUM
Your data contains at least one line whose first field is a negative number.
You can indicate to grep
that it should treat an argument as a pattern by preceding it with -e
.
Consider:
$ seq -5 5 | grep -c 3
2
$ seq -5 5 | grep -c -3
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
$ seq -5 5 | grep -c -3.0
grep: invalid option -- '.'
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
$ seq -5 5 | grep -c -e -3.0
0
$
Note that your code is quite inefficient.
Unlesss your data is huge you can use a single awk
call to do all the processing. Perhaps:
awk '
{
k = $1 OFS $2
n[k]++
s3[k] += $3
s4[k] += $4
s5[k] += $5
s6[k] += $6
s7[k] += $7
}
END {
for (k in n)
print k,
s3[k]/n[k],
s4[k]/n[k],
s5[k]/n[k],
s6[k]/n[k],
s7[k]/n[k]
}
' file1 >Averages.dat
If the input is huge, you may need to sort
first and use a processing procedure that tracks when the key changes so that you don't need to store everything in memory. Perhaps:
sort -n -k1,1 -k2,2 file1 | awk '
function p() { print pk, s3/n, s4/n, s5/n, s6/n, s7/n }
{
k = $1 OFS $2
if (k!=pk) {
if (NR>1) p()
n = s3 = s4 = s5 = s6 = s7 = 0
}
n++
s3 += $3
s4 += $4
s5 += $5
s6 += $6
s7 += $7
pk = k
}
END { p() }
' > Averages.dat