My log file has following format
[30/Jan/2015:10:10:30 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 425
[30/Jan/2015:10:11:00 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 261
[30/Jan/2015:10:11:29 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 232
[30/Jan/2015:10:12:00 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 315
[30/Jan/2015:10:12:29 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 221
[30/Jan/2015:10:12:57 +0000] 12.30.30.182 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 218
Each line in this log file has timestamp in first field and response time in the last field. Is there a way in awk
to read the average response time in all specific intervals? For example, calculating avg response time in every five minutes based on the timestamp in log file.
Or is there any best alternative way to do this other than awk
? Please suggest.
Update:
I have tried the following way which is static way of doing it and will give only average of one time interval.
$ grep "30/Jan/2015:10:1[0-4]" mylog.log | awk '{resp+=$NF;cnt++;}END{print "Avg:"int(resp/cnt)}'
But I need to do it for the whole file for all 5 minutes. Even if I loop the command, how can I pass the date dynamically to the command? Because the log file varies every time and the dates in it.
Hm. GNU date does not like your date format, so I guess we'll have to parse it ourselves. I'm thinking along these lines (this requires gawk for mktime
):
# returns the seconds since epoch that stamp represents. This will be
# the first field in the line, with [] and everything. It's rather
# rudimentary:
function parse_timestamp(stamp) {
# Split stamp into tokens delimited by [, ], /, : or space
split(stamp, c, "[][/: ]")
# reassemble (using the lookup table for the months from below) in a
# format that mktime understands (then call mktime).
return mktime(c[4] " " mnums[c[3]] " " c[2] " " c[5] " " c[6] " " c[7])
}
BEGIN {
# parse_timestamp needs this lookup table.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", mnames)
for(i = 1; i <= length(mnames); ++i) {
mnums[mnames[i]] = i
}
# time is a parameter supplied by you.
start = parse_timestamp(time)
end = start + 300
if(start == -1) {
print "Warning: Could not parse timestamp \"" time "\""
}
}
{
# in each line: parse the timestamp
curtime = parse_timestamp($1)
}
# if it lies in the interval you want, sum up the last field and increase
# the counter
curtime >= start && curtime < end {
sum += $NF
++count
}
END {
# and in the end, print the average.
print "Avg: " (count == 0 ? "undef" : sum / count)
}
Put this in a file, say average.awk
, and call
awk -v time='[30/Jan/2015:10:11:20 +0000]' -f average.awk foo.log
If you are sure the log file will be sorted in ascending order (which is probably the case), you could make this more efficient by replacing
curtime >= start && curtime < end {
sum += $NF
++count
}
with
curtime >= end {
exit
}
curtime >= start {
sum += $NF
++count
}
This will stop searching for fitting log entries after the first one was found that's after the range you were looking for.
Addendum: Since OP clarified that he wanted Summaries for all five minute intervals in a sorted makefile, a tweaked script to do that is
#!/usr/bin/awk -f
function parse_timestamp(stamp) {
split(stamp, c, "[][/: ]")
return mktime(c[4] " " mnums[c[3]] " " c[2] " " c[5] " " c[6] " " c[7])
}
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", mnames)
for(i = 1; i <= length(mnames); ++i) {
mnums[mnames[i]] = i
}
}
{
curtime = parse_timestamp($1)
}
NR == 1 {
# pull the start time from the first line
start = curtime
end = start + 300
}
curtime > end {
# print result, reset counters when endtimes are past
print "Avg: " (count == 0 ? "undef" : sum / count)
sum = 0
count = 0
end += 300
}
{
sum += $NF
++count
}
END {
# print once more at the very end for the last, unfinished interval.
print "Avg: " (count == 0 ? "undef" : sum / count)
}