Have a dataset with a duration column with time data listed as an object shown below
df['duration'].head(10)
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
3 52 min.
4 1 hr. 25 min.
5 45 min.
6 45 min.
7 60 min.
8 45 min.
9 45 min.
Name: duration, dtype: object
How do I change this to an appropriate numerical value, like below?
0 00:60
1 01:13
2 01:10
3 00:52
4 01:25
5 00:45
Here is a way to get a string version in %H:%M
format and a timedelta version:
import pandas as pd
df = pd.DataFrame({'duration':['60 min.', '1 hr. 13 min.', '1 hr. 10 min.']})
print(df)
df['parts']=df.duration.str.findall('\d+')
df['timedelta']=df.parts.apply(lambda x: pd.to_timedelta((0 if len(x) < 2 else int(x[0])) * 3600 + int(x[-1])*60, unit='s'))
df['hours and minutes']=df.parts.apply(lambda x: f"{0 if len(x) < 2 else int(x[0]):02}:{int(x[-1]):02}")
df = df.drop(columns=['duration', 'parts'])
print(df)
Input:
duration
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
Output:
timedelta hours and minutes
0 0 days 01:00:00 00:60
1 0 days 01:13:00 01:13
2 0 days 01:10:00 01:10
If we do this:
print(df.timedelta.dtypes)
... we see that the timedelta column indeed contains numerical values (of timedelta data type):
timedelta64[ns]