Search code examples
awkgawk

AWK - Is it possible to Breakdown a log file by a distinct field && by hour


Question

I am trying to find out if it is possible with awk alone to pass in a log file and then have awk output a distinct message with a breakdown of the hour (00-23) as well as a count, for that particular hour vs distinct message.

Example

Output requested

Message1
00 13
01 30
...
23 6

Message2
00 50
01 10
...
23 120
etc, etc

The input file would look a little something like the following:

blah,blah
2016-06-24 00:30:54 blah Message1 7 rand rand2
2016-06-24 00:40:12 blah Message2 35 rand rand2
2016-06-24 00:42:15 blah Message2 12 rand rand2
2016-06-24 00:58:01 blah Message1 5 rand rand2
2016-06-24 00:58:12 blah Message2 3 rand rand2
2016-06-24 01:02:25 blah Message2 2 rand rand2
2016-06-24 01:02:30 blah Message1 3 rand rand2
2016-06-24 01:05:14 blah Message1 10 rand rand2
2016-06-24 01:30:56 blah Message2 5 rand rand2
2016-06-24 01:55:41 blah Message2 3 rand rand2
blah, blah

Please note that this is a made up input file.

To get the output requested from this input file I know that I will need to print $4 then on a new line do something like print substr($2,1,2)" "sum[$5]. For the same hours for the same $4 I will have to add the $4's together.

Code

Also note that I am having to use awk 3.1.7 so I can't do any of the fancy new stuff that is is awk 4.1.0+.

I know how to get the distinct messages.

{
msg[$4]++
}
END {
  for (m in msg) {
     print m;
   }
}

To return the hour could I do something along the lines of:

{
msg[$4]++
hr[$4] = substr($2,1,2)
}
END {
  for (m in msg) {
     print m;
     print hr[m];
   }
}

And finally for the sum would it be something along the lines of:

{
msg[$4]++
hr[$4] = substr($2,1,2)
sum[$4] += $5
}
END {
  for (m in msg) {
     print m;
     print hr[m]" "sum[m];     
   }
}

Any and all help is greatly appreciated.


Solution

  • You'll want something like:

    $ cat tst.awk
    BEGIN { FS="[ :]" }
    { sum[$6,$2]+=$7; msgs[$6]; hrs[$2] }
    END {
        for (msg in msgs) {
            print msg
            for (hr in hrs) {
                print hr, sum[msg,hr]+0
            }
            print ""
        }
    }
    
    $ awk -f tst.awk file
    Message1
    00 12
    01 13
    
    Message2
    00 50
    01 10
    

    but obviously it's a bit of a guess since it's run against your posted sample input but you didn't provide the associated expected output.

    btw wrt the question subject line AWK - Is it possible..., assuming it's about manipulating text the answer to that question is always "yes" so no need to ask if it's possible.

    I just noticed your previous question where you say the hour may not always be present in your data so this may be what you really are looking for:

    $ cat tst.awk
    BEGIN { FS="[ :]" }
    { sum[$6,$2+0]+=$7; msgs[$6] }
    END {
        for (msg in msgs) {
            print msg
            #for (hr=0; hr<=23; hr++) {
            for (hr=0; hr<=4; hr++) {
                printf "%02d %d\n", hr, sum[msg,hr]
            }
            print ""
        }
    }
    $
    $ awk -f tst.awk file
    Message1
    00 12
    01 13
    02 0
    03 0
    04 0
    
    Message2
    00 50
    01 10
    02 0
    03 0
    04 0
    

    Change the "4" to "23" obviously. I'd also recommend you consider a CSV output instead so you can import to Excel, etc., e.g.:

    $ cat tst.awk
    BEGIN { FS="[ :]"; OFS="," }
    { sum[$6,$2+0]+=$7; msgs[$6] }
    END {
        printf "hr"
        for (msg in msgs) {
            printf "%s%s", OFS, msg
        }
        print ""
        for (hr=0; hr<=4; hr++) {
            printf "%02d", hr
            for (msg in msgs) {
                printf "%s%d", OFS, sum[msg,hr]
            }
            print ""
        }
    }
    
    $ awk -f tst.awk file
    hr,Message1,Message2
    00,12,50
    01,13,10
    02,0,0
    03,0,0
    04,0,0
    
    $ awk -f tst.awk file | column -s, -t
    hr  Message1  Message2
    00  12        50
    01  13        10
    02  0         0
    03  0         0
    04  0         0