Suppose I've got a log file like this (1st column is a timestamp):
1699740442177 Start A
1699740442177 Start B
1699740442255 Start C
1699740442261 Finish B
1699740442337 Finish C
1699740442337 Finish A
I want to determine how much time in average it took for A
, B
, and C
. So I wrote an awk
script :
/Start A/ {
startA = $1
}
/Finish A/ {
sumA += $1 - startA
numA++
}
/Start B/ {
startB = $1
}
/Finish B/ {
sumB += $1 - startB
numB++
}
/Start C/ {
startC = $1
}
/Finish C/ {
sumC += $1 - startC
numC++
}
END {
print sumA/numA, sumB/numB, sumC/numC
}
The script works but it is somewhat boilerplate. How would you suggest improve the script and make it less boilerplate ?
The number of letters may vary (10-15 letters max). We can assume there is no overlap between Start/Finish
pairs for the same letter.
You could store starting timestamps, absolute sums, and counts into arrays, all indexed by the third column, and print out the averages at the END
:
$2 == "Start" { t[$3] = $1; n[$3] += 1 }
$2 == "Finish" { s[$3] += $1 - t[$3] }
END { for (k in n) print "Average of", k, "is", s[k] / n[k] }
Average of A is 160
Average of B is 84
Average of C is 82