Search code examples
language-agnostic

How to group grep results by seconds


I have the following in a log file,

01:31:01,222 Received event
01:31:01,435 Received event
01:31:01,441 Received event
01:31:01,587 Received event
01:31:02,110 Received event
01:31:02,650 Received event
01:31:02,869 Received event
01:31:03,034 Received event
01:31:03,222 Received event

I would like to group this by seconds and count the number of lines in each group to output the following,

01:31:01 4
01:31:02 3
01:31:03 2

Ideally I like to do this in a simple awk script without having to resort to perl or python, any ideas? Thanks.


Solution

  • Sounds like a job for awk:

    awk -F, '{a[$1]++}END{for(i in a){print i, a[i]}}' file.txt
    

    Output:

    01:31:01 4
    01:31:02 3
    01:31:03 2
    

    Explanation:

    I'm using the option -F (field separator) and set it to ,. This makes it easy to obtain the time with seconds accuracy in field 1 ($1).

    Explanation of the script itself (in a multiline form):

    # Runs on every line and increments a count tied to the first field (the time)
    # (The associative array a will get created on first access)
    {a[$1]++}
    
    # Runs after all lines have been processed. Iterates trough the array 'a' and prints
    # each key (time) and its associated value (count)
    END {
        for(i in a){
            print i, a[i]
        }
    }