Search code examples
pythonpandasdataframetimedelta

Converting dataframe column from object to timedelta and summing


I have a Pandas dataframe that I'm segregating by month:

months = [g for n, g in df.groupby(pd.Grouper(key='DATE',freq='M'))]

I'm then taking each month and summing the integers in a columns called PARTS RUN.

parts_run_month_sum = months[month]['PARTS RUN'].sum()

This is all working fine. The last thing I need to do is add the hours together from column HOURS RUN (HR:MIN). This column is an object data type. The data itself it a timedelta format, not datetime. The format is like so: 02:11:40, being hours:minutes:seconds.

If I run the below line of code, it prints the correct index numbers related to the number of lines there happen to be for each month:

for run_time in range(len(months[month]['HOURS RUN (HR:MIN)'])):
    print(run_time)

But if I try to get the lines of times themselves I receive a KeyError: 0, although there is a key of 0 in each month that is returned in the above example.

for run_time in range(len(months[month]['HOURS RUN (HR:MIN)'])):
    print(months[month]['HOURS RUN (HR:MIN)'][run_time])

What I'm actually looking for is how to sum the column of times, but because they are objects I cannot do this.

How can I convert a column with format hours:minutes:seconds to timedelta and sum the times?


Solution

  • I think you need:

    df['HOURS RUN (HR:MIN)'] = pd.to_timedelta(df['HOURS RUN (HR:MIN)'])
    
    #if values are times
    df['HOURS RUN (HR:MIN)'] = pd.to_timedelta(df['HOURS RUN (HR:MIN)'].astype(str))
    

    I think instead your solution is possible aggregate sum:

    df1 = df.groupby(pd.Grouper(key='DATE',freq='M'))['HOURS RUN (HR:MIN)'].sum()