I have a csv with a column that represents time durations of two discrete events.
Day,Duration
Mon,"S: 3h0s, P: 18m0s"
Tues,"S: 3h0s, P: 18m0s"
Wed,"S: 4h0s, P: 18m0s"
Thurs,"S: 30h, P: 10m0s"
Fri,"S: 15m, P: 3h0s"
I want to split that duration into two distinct columns and consistently represent the time in minutes
. Right now, it is shown in hours
, minutes
, and seconds
, like S: 3h0s, P: 18m0s
. So the output should look like this:
Day Duration S(min) P(min)
0 Mon S: 3h0s, P: 18m0s 180 18
1 Tues S: 3h0s, P: 18m0s 180 18
2 Wed S: 4h0s, P: 18m0s 240 18
3 Thur S: 30h0s, P: 10m0s 1800 10
4 Fri S: 15m, P: 3h0s 15 180
But when I do in str.replace
import pandas as pd
df = pd.read_csv("/file.csv")
df["S(min)"] = df['Duration'].str.split(',').str[0]
df["P(min)"] = df['Duration'].str.split(',').str[-1]
df['S(min)'] = df['S(min)'].str.replace("S: ", '').str.replace("h", '*60').str.replace('m','*1').str.replace('s','*(1/60)').apply(eval)
df['P(min)'] = df['P(min)'].str.replace("P: ", '').str.replace("h", '*60').str.replace('m','*1').str.replace('s','*(1/60)').apply(eval)
some of the calculations are off:
Day Duration S(min) P(min)
0 Mon S: 3h0s, P: 18m0s 30.0 3.000000
1 Tues S: 3h0s, P: 18m0s 30.0 3.000000
2 Wed S: 4h0s, P: 18m0s 40.0 3.000000
3 Thurs S: 30h, P: 10m0s 1800.0 1.666667
4 Fri S: 15m, P: 3h0s 15.0 30.000000
Using regex and pd.to_timedelta
df[['S', 'P']] = df['Duration'].str.extract(r'(S: .*?), P:( .*)')
df['S(min)'] = pd.to_timedelta(df['Duration'].str.replace('[SP]: ', '', regex=True).str.split(',').str[0]).dt.total_seconds() / 60
df['P(min)'] = pd.to_timedelta(df['Duration'].str.replace('[SP]: ', '', regex=True).str.split(',').str[-1]).dt.total_seconds() / 60
df.drop(['S', 'P'], axis=1, inplace=True)
print(df)
More Simplified approach with different regex pattern:
df[['S', 'P']] = df['Duration'].str.extract(r'S: (.*?), P: (.*)')
df['S'] = pd.to_timedelta(df['S']).dt.total_seconds()/60
df['P'] = pd.to_timedelta(df['P']).dt.total_seconds()/60
df = df.rename(columns = {'S': 'S(min)', 'P': 'P(min)'})
print(df)
Day Duration S(min) P(min)
0 Mon S: 3h0s, P: 18m0s 180.0 18.0
1 Tues S: 3h0s, P: 18m0s 180.0 18.0
2 Wed S: 4h0s, P: 18m0s 240.0 18.0
3 Thur S: 30h0s, P: 10m0s 1800.0 10.0
4 Fri S: 15m, P: 3h0s 15.0 180.0