I have a log file that looks like this:
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-hashes.1year.png HTTP/1.1" 200 1582
216.139.185.45 - - [12/Mar/2004:13:04:01 -0800] "GET /mailman/listinfo/webber HTTP/1.1" 200 6051
pd95f99f2.dip.t-dialin.net - - [12/Mar/2004:13:18:57 -0800] "GET /razor.html HTTP/1.1" 200 2869
d97082.upc-d.chello.nl - - [12/Mar/2004:13:25:45 -0800] "GET /SpamAssassin.html HTTP/1.1" 200 7368
I want to count how many logs there are for each hour and sort them from most to least frequent. For these 4 logs, the result should be the following.
How do I do this with only the packages that come with a standard release of Python 3?
I could look for the position of the first colon in each line, then extract the 2 characters after that position. However, I fear that there could be other colons beforehand.
Is there a more "intelligent" method?
I could look for the position of the first colon in each line, then extract the 2 characters after that position. However, I fear that there could be other colons beforehand
Instead of looking for the first colon, you can
look for the ' - - '
log_message = '216.139.185.45 - - [12/Mar/2004:13:04:01 -0800] "GET /mailman/listinfo/webber HTTP/1.1" 200 6051'
log_hour = log_message.split(' - - ')[1].split(':')[1]
or directly the first open bracket ([) and then the colon
log_hour = log_message.split('[')[1].split(':')[1]
To get the hour frequency you can use the following code
hour_frequency_dict = {hour:0 for hour in list(range(24))}
for log_message in log_message_list:
log_hour = int(log_message.split(' - - ')[1].split(':')[1])
hour_frequency_dict[log_hour] += 1
hour_frequency_dict = {hour: frequency for hour, frequency in hour_frequency_dict.items() if frequency > 0}