awkrefactoring

How to make this awk script less boilerplate?

Suppose I've got a log file like this (1st column is a timestamp):

``````1699740442177 Start A
1699740442177 Start B
1699740442255 Start C
1699740442261 Finish B
1699740442337 Finish C
1699740442337 Finish A
``````

I want to determine how much time in average it took for `A`, `B`, and `C`. So I wrote an `awk` script :

``````/Start A/ {
startA = \$1
}
/Finish A/ {
sumA += \$1 - startA
numA++
}
/Start B/ {
startB = \$1
}
/Finish B/ {
sumB += \$1 - startB
numB++
}
/Start C/ {
startC = \$1
}
/Finish C/ {
sumC += \$1 - startC
numC++
}
END {
print sumA/numA, sumB/numB, sumC/numC
}
``````

The script works but it is somewhat boilerplate. How would you suggest improve the script and make it less boilerplate ?

The number of letters may vary (10-15 letters max). We can assume there is no overlap between `Start/Finish` pairs for the same letter.

Solution

• You could store starting timestamps, absolute sums, and counts into arrays, all indexed by the third column, and print out the averages at the `END`:

``````\$2 == "Start"  { t[\$3] = \$1; n[\$3] += 1 }
\$2 == "Finish" { s[\$3] += \$1 - t[\$3] }
END { for (k in n) print "Average of", k, "is", s[k] / n[k] }
``````
``````Average of A is 160
Average of B is 84
Average of C is 82
``````