I have a Pandas dataframe that I'm segregating by month:
months = [g for n, g in df.groupby(pd.Grouper(key='DATE',freq='M'))]
I'm then taking each month and summing the integers in a columns called PARTS RUN
.
parts_run_month_sum = months[month]['PARTS RUN'].sum()
This is all working fine. The last thing I need to do is add the hours together from column HOURS RUN (HR:MIN)
. This column is an object data type. The data itself it a timedelta format, not datetime. The format is like so: 02:11:40
, being hours:minutes:seconds.
If I run the below line of code, it prints the correct index numbers related to the number of lines there happen to be for each month:
for run_time in range(len(months[month]['HOURS RUN (HR:MIN)'])):
print(run_time)
But if I try to get the lines of times themselves I receive a KeyError: 0
, although there is a key of 0 in each month that is returned in the above example.
for run_time in range(len(months[month]['HOURS RUN (HR:MIN)'])):
print(months[month]['HOURS RUN (HR:MIN)'][run_time])
What I'm actually looking for is how to sum the column of times, but because they are objects I cannot do this.
How can I convert a column with format hours:minutes:seconds
to timedelta and sum the times?
I think you need:
df['HOURS RUN (HR:MIN)'] = pd.to_timedelta(df['HOURS RUN (HR:MIN)'])
#if values are times
df['HOURS RUN (HR:MIN)'] = pd.to_timedelta(df['HOURS RUN (HR:MIN)'].astype(str))
I think instead your solution is possible aggregate sum
:
df1 = df.groupby(pd.Grouper(key='DATE',freq='M'))['HOURS RUN (HR:MIN)'].sum()