python performance loops datetime intervals

Find how many events in each 30-minute interval, without looping many times on the events

This works and prints the number of events in each 30-minute intervals:

00:00 to 00:30, 00:30 to 01:00, ..., 23:30 to 24:00

import time, datetime
L = ["20231017_021000", "20231017_021100", "20231017_021200", "20231017_052800", "20231017_093100", "20231017_093900"]
d = datetime.datetime.strptime("20231017_000000", "%Y%m%d_%H%M%S")
M = [(d + datetime.timedelta(minutes=30*k)).strftime("%Y%m%d_%H%M%S") for k in range(49)]
Y = [sum([m1 < l <= m2 for l in L]) for m1, m2 in zip(M, M[1:])]
print(Y)
# [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# => 3 events between 02:00 and 02:30
# => 1 event between 05:00 and 05:30
# => 2 events between 09:30 and 10:00

Problem: it loops 48 times on the list L which can be long.

How to do the same with a single loop pass on L? (without pandas, numpy, etc. but just Python built-in modules)?

Solution

You can achieve this with a single loop pass on L by computing the interval for each time in L and then counting the occurrences in that interval.

import datetime

L = ["20231017_021000", "20231017_021100", "20231017_021200", "20231017_052800", "20231017_093100", "20231017_093900"]
d = datetime.datetime.strptime("20231017_000000", "%Y%m%d_%H%M%S")

Y = [0 for _ in range(48)]

for l in L:
    # Get the time difference between the current time and the base time (in minutes)
    diff = (datetime.datetime.strptime(l, "%Y%m%d_%H%M%S") - d).seconds // 60

    # Find the interval (index in the result list)
    idx = diff // 30

    # Increment the count for that interval
    Y[idx] += 1

print(Y)