Search code examples
filetimerangeaggregaterecords

How to aggregate log records by time


I have a huge log file containing log messages prefixed with timestamp. The timestamp is with the precision of microseconds. I want to find a 10 sec time window when highest number of messages were logged. How can you do that?


Solution

  • You'd need to slurp in the file line by line, figure out which 10s period each timestamp is in, and keep track of which timestamp range had the biggest "member" count.

    You don't specify which language, so I'll just use pseudocode:

    1. read a line
    2. extract/convert timestamp to a 10s interval number
    3. if this timestamp is outside the range of the previous interval, "remember" that interval's membership count and start a new interval counter
    4. If the previous interval's membership count is bigger than the last recorded biggest interval, make the previous interval be the new "biggest" interval
    5. Increment interval counter for this new line.
    6. repeat until file's been consumed
    7. spit out the recorded interval number, which will have had the biggest membership count