I have the following in a log file,
01:31:01,222 Received event
01:31:01,435 Received event
01:31:01,441 Received event
01:31:01,587 Received event
01:31:02,110 Received event
01:31:02,650 Received event
01:31:02,869 Received event
01:31:03,034 Received event
01:31:03,222 Received event
I would like to group this by seconds and count the number of lines in each group to output the following,
01:31:01 4
01:31:02 3
01:31:03 2
Ideally I like to do this in a simple awk script without having to resort to perl or python, any ideas? Thanks.
Sounds like a job for awk
:
awk -F, '{a[$1]++}END{for(i in a){print i, a[i]}}' file.txt
Output:
01:31:01 4
01:31:02 3
01:31:03 2
Explanation:
I'm using the option -F
(field separator) and set it to ,
. This makes it easy to obtain the time with seconds accuracy in field 1 ($1
).
Explanation of the script itself (in a multiline form):
# Runs on every line and increments a count tied to the first field (the time)
# (The associative array a will get created on first access)
{a[$1]++}
# Runs after all lines have been processed. Iterates trough the array 'a' and prints
# each key (time) and its associated value (count)
END {
for(i in a){
print i, a[i]
}
}