Search code examples
pythonperformanceloopsdatetimeintervals

Find how many events in each 30-minute interval, without looping many times on the events


This works and prints the number of events in each 30-minute intervals:

00:00 to 00:30, 00:30 to 01:00, ..., 23:30 to 24:00

import time, datetime
L = ["20231017_021000", "20231017_021100", "20231017_021200", "20231017_052800", "20231017_093100", "20231017_093900"]
d = datetime.datetime.strptime("20231017_000000", "%Y%m%d_%H%M%S")
M = [(d + datetime.timedelta(minutes=30*k)).strftime("%Y%m%d_%H%M%S") for k in range(49)]
Y = [sum([m1 < l <= m2 for l in L]) for m1, m2 in zip(M, M[1:])]
print(Y)
# [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# => 3 events between 02:00 and 02:30
# => 1 event between 05:00 and 05:30
# => 2 events between 09:30 and 10:00

Problem: it loops 48 times on the list L which can be long.

How to do the same with a single loop pass on L? (without pandas, numpy, etc. but just Python built-in modules)?


Solution

  • You can achieve this with a single loop pass on L by computing the interval for each time in L and then counting the occurrences in that interval.

    import datetime
    
    L = ["20231017_021000", "20231017_021100", "20231017_021200", "20231017_052800", "20231017_093100", "20231017_093900"]
    d = datetime.datetime.strptime("20231017_000000", "%Y%m%d_%H%M%S")
    
    Y = [0 for _ in range(48)]
    
    for l in L:
        # Get the time difference between the current time and the base time (in minutes)
        diff = (datetime.datetime.strptime(l, "%Y%m%d_%H%M%S") - d).seconds // 60
    
        # Find the interval (index in the result list)
        idx = diff // 30
    
        # Increment the count for that interval
        Y[idx] += 1
    
    print(Y)