Search code examples
pythonloggingtext

Use Python to get table of frequency of hours from log file


I have a log file that looks like this:

10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-hashes.1year.png HTTP/1.1" 200 1582

216.139.185.45 - - [12/Mar/2004:13:04:01 -0800] "GET /mailman/listinfo/webber HTTP/1.1" 200 6051

pd95f99f2.dip.t-dialin.net - - [12/Mar/2004:13:18:57 -0800] "GET /razor.html HTTP/1.1" 200 2869

d97082.upc-d.chello.nl - - [12/Mar/2004:13:25:45 -0800] "GET /SpamAssassin.html HTTP/1.1" 200 7368

I want to count how many logs there are for each hour and sort them from most to least frequent. For these 4 logs, the result should be the following.

enter image description here

How do I do this with only the packages that come with a standard release of Python 3?

I could look for the position of the first colon in each line, then extract the 2 characters after that position. However, I fear that there could be other colons beforehand.

Is there a more "intelligent" method?


Solution

  • I could look for the position of the first colon in each line, then extract the 2 characters after that position. However, I fear that there could be other colons beforehand

    Instead of looking for the first colon, you can

    1. look for the ' - - '

      log_message = '216.139.185.45 - - [12/Mar/2004:13:04:01 -0800] "GET /mailman/listinfo/webber HTTP/1.1" 200 6051'
      log_hour = log_message.split(' - - ')[1].split(':')[1]
      
    2. or directly the first open bracket ([) and then the colon

      log_hour = log_message.split('[')[1].split(':')[1]
      

    To get the hour frequency you can use the following code

    hour_frequency_dict = {hour:0 for hour in list(range(24))}
    for log_message in log_message_list:
        log_hour = int(log_message.split(' - - ')[1].split(':')[1])
        hour_frequency_dict[log_hour] += 1
    hour_frequency_dict = {hour: frequency for hour, frequency in hour_frequency_dict.items() if frequency > 0}