Search code examples
pythontimestamppython-datetime

Finding time intervals per day from a list of timestamps in Python


i am trying to compute time intervals per day from a list of unix timestamps in Python. I have searched for simular questions on stack overflow but mostly found examples of computing deltas or SQL solutions.

I have a list of the sort:

timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]

I can easily turn this list of timestamps into datetime objects using:

[dt.datetime.fromtimestamp(int(i)) for i in timestamps]

From there I can probably write quite a lengthy code where the first day/month is kept and a check is done to see if the next item in the list is of the same day/month. If it is I look at the times, get the first and last from the day and store the interval + day/month in a dictionary.

As I am fairly new to Python I was wondering what is the best way to do this in this programming language without the abusive use of if/else statements.

Thank you in advance


Solution

  • If the list sorted as in your case then you could use itertools.groupby() to group the timestamps into days:

    #!/usr/bin/env python
    from datetime import date, timedelta
    from itertools import groupby
    
    epoch = date(1970, 1, 1)
    
    result = {}
    assert timestamps == sorted(timestamps)
    for day, group in groupby(timestamps, key=lambda ts: ts // 86400):
        # store the interval + day/month in a dictionary.
        same_day = list(group)
        assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]
        result[epoch + timedelta(day)] = same_day[0], same_day[-1] 
    print(result)
    

    Output

    {datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),
     datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),
     datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),
     datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),
     datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),
     datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),
     datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),
     datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),
     datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),
     datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),
     datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),
     datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),
     datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),
     datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),
     datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),
     datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),
     datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),
     datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),
     datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}
    

    If there is only one timestamp in that day than it is repeated twice.

    how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14?

    entries = sorted(result.items())
    intervals = [(end - start) for _, (start, end) in entries]
    print(max(intervals[-5:]) > max(intervals[-5-14:-5]))
    # -> False