Search code examples
pythonpandastimedelta

Pandas rolling sum for time


I'm trying to get a rolling sum of time values in a dataframe that looks like this:

    RunTime
0  00:51:25
1       NaT
2  00:42:16
3       NaT
4  00:40:15
5       NaT
6  00:50:13
7  00:53:28
8       NaT
9  00:37:32
10      NaT
11 01:53:22
12 01:08:22
13 00:59:57
14 00:12:22

Expected output:

     RunTime  RunTime_MS
0   00:51:25    
1   NaT         
2   00:42:16    
3   NaT         
4   00:40:15    
5   NaT         
6   00:50:13    3:04:09
7   00:53:28    3:06:12
8   NaT         3:06:12
9   00:37:32    3:01:28
10  NaT         3:01:28
11  01:53:22    4:14:35
12  01:08:22    5:22:57
13  00:59:57    5:32:41
14  00:12:22    4:51:35

For other columns (containing floats) in the dataframe i'm using

dfExt['Distance_MS'] = dfExt['Distance'].fillna(value=0).rolling(window=7).sum()

which is working just fine. If I try to do this on the time column I get the error

ops for Rolling for this dtype timedelta64[ns] are not implemented

eventhough the documentation seems to indicate that .sum() is something you can do on timedelta.

This is example code:

import pandas as pd
from datetime import datetime, timedelta

RunTimeValues = ['00:51:25','','00:42:16','','00:40:15','','00:50:13','00:53:28','','00:37:32','','01:53:22','01:08:22','00:59:57','00:12:22']
for i in range(len(RunTimeValues)):
    if RunTimeValues[i] != '':
        #RunTimeValues[i] = datetime.strptime(RunTimeValues[i], "%H:%M:%S")
        t = datetime.strptime(RunTimeValues[i],"%H:%M:%S")
        RunTimeValues[i] = timedelta(hours=t.hour, minutes=t.minute, seconds=t.second)
dfExt = pd.DataFrame({'RunTime': RunTimeValues})
dfExt['RunTime_MS'] = dfExt['RunTime'].fillna(value=0).rolling(window=7).sum()
print(dfExt)

I know I can convert the timedeltas to hours in floats and then do the rolling sum, but that result is not exactly what I want. Any suggestions?


Solution

  • This will do it:

    dfExt['RunTime_MS'] = pd.to_timedelta(dfExt['RunTime'].fillna(0).dt.total_seconds().rolling(window=7).sum(), unit='s')
    print(dfExt)
        RunTime RunTime_MS
    0  00:51:25        NaT
    1       NaT        NaT
    2  00:42:16        NaT
    3       NaT        NaT
    4  00:40:15        NaT
    5       NaT        NaT
    6  00:50:13   03:04:09
    7  00:53:28   03:06:12
    8       NaT   03:06:12
    9  00:37:32   03:01:28
    10      NaT   03:01:28
    11 01:53:22   04:14:35
    12 01:08:22   05:22:57
    13 00:59:57   05:32:41
    14 00:12:22   04:51:35