Search code examples
pandasdatetimegroup-byaggregate-functionshour

Pandas aggregating hours in group by


I have this table:

Date User ID Calls Hours
3-9-2022 2234 10 01:06:19
3-9-2022 2234 4 00:32:54

That I want to group into (resetting index as well):

Date User ID Calls Hours
3-9-2022 2234 14 01:39:13

I am getting stuck into aggregating the hours, I read that you have to convert it first into timedelta but I am not sure how to put the function into group by (perhaps with lambda?). In addition I have Calls applied with normal sum function. Here is my current code:

df.groupby(['Date','UserID'])['Calls','Hours'].apply(lambda x : x.astype(int).sum()).reset_index()

Data type:
Date - Datetime
UserID - Int
Calls - Int
Hours - Datetime (Time)

Any help appreciated!

ANSWER

df['Hours'] = pd.to_timedelta(df['Hours'])

df_group = df.groupby(['Date', 'User ID']).agg({'Calls':'sum', 'Hours':'sum'})

Solution

  • First convert column Hours to timedeltas, aggregate sum and then convert back to HH:MM:SS format:

    def f(x):
        ts = x.total_seconds()
        hours, remainder = divmod(ts, 3600)
        minutes, seconds = divmod(remainder, 60)
        return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 
    
    
    out = (df.assign(Hours = pd.to_timedelta(df['Hours']))
              .groupby(['Date','User ID'], as_index=False)[['Calls','Hours']]
              .sum()
              .assign(Hours = lambda x: x['Hours'].apply(f)))
    print (out)
           Date  User ID  Calls     Hours
    0  3-9-2022     2234     14  01:39:13