Search code examples
awkrefactoring

How to make this awk script less boilerplate?


Suppose I've got a log file like this (1st column is a timestamp):

1699740442177 Start A
1699740442177 Start B
1699740442255 Start C
1699740442261 Finish B
1699740442337 Finish C
1699740442337 Finish A

I want to determine how much time in average it took for A, B, and C. So I wrote an awk script :

/Start A/ {
  startA = $1
}
/Finish A/ {
  sumA += $1 - startA
  numA++ 
}
/Start B/ {
  startB = $1
}
/Finish B/ {
  sumB += $1 - startB
  numB++ 
}
/Start C/ {
  startC = $1
}
/Finish C/ {
  sumC += $1 - startC
  numC++ 
}
END {
  print sumA/numA, sumB/numB, sumC/numC
}

The script works but it is somewhat boilerplate. How would you suggest improve the script and make it less boilerplate ?

The number of letters may vary (10-15 letters max). We can assume there is no overlap between Start/Finish pairs for the same letter.


Solution

  • You could store starting timestamps, absolute sums, and counts into arrays, all indexed by the third column, and print out the averages at the END:

    $2 == "Start"  { t[$3] = $1; n[$3] += 1 }
    $2 == "Finish" { s[$3] += $1 - t[$3] }
    END { for (k in n) print "Average of", k, "is", s[k] / n[k] }
    
    Average of A is 160
    Average of B is 84
    Average of C is 82