I have a file with 5 fields of content. I am evaluating 4 lines at a time in the file. So, records 1-4 are evaluated as a set. Records 5-8 are another set. Within each set, I want to extract the time from field 5 when field 4 has the max value. If there are duplicate values in field 4, then evaluate the maximum value in field 2 and use the time in field 5 associated with the max value in field 2.
For example, in the first 4 records, there is a duplicate max value in field 4 (value of 53). If that is true, I need to look at field 2 and find the maximum value. Then print the time associated with the max value in field 2 with the time in field 5.
The Data Set is:
00 31444 8.7 24 00:04:32
00 44574 12.4 25 00:01:41
00 74984 20.8 53 00:02:22
00 84465 23.5 53 00:12:33
01 34748 9.7 38 01:59:28
01 44471 12.4 37 01:55:29
01 74280 20.6 58 01:10:24
01 80673 22.4 53 01:55:49
The desired Output for records 1 through 4 is 00:12:33 The desired output for records 5 through 8 is 01:10:24
Here is my answer:
Evaluate Records 1 through 4
awk 'NR==1,NR==4 {if(max <= $4) {max = $4; time = $5} else if(max == $4) {max = $2; time = $5};next}END {print time}' test.txt test.txt
Output is: 00:12:33
Evaluate Records 5 through 8
awk 'NR==5,NR==8 {if(max <= $4) {max = $4; time = $5} else if(max == $4) {max = $2; time = $5};next}END {print time}' test.txt test.txt
Output is 01:10:24
Any suggestions on how to evaluate the record ranges more efficiently without having to write an awk statement for each set of records?
Thanks
Based on your sample input, the fact there's 4 lines for each key (first field) seems to be irrelevant and what you really want is to just produce output for each key so consider sorting the input by your desired comparison fields (field 4 then field 2) then printing the first desired output (field 5) value seen for each block per key (field 1):
$ sort -n -k1,1 -k4,4r -k2,2r file | awk '!seen[$1]++{print $5}'
00:12:33
01:10:24