Search code examples
pythonpandasdataframetimedelta

Convert hours-minutes-seconds duration in dataframe to minutes


I have a csv with a column that represents time durations of two discrete events.

Day,Duration
Mon,"S: 3h0s, P: 18m0s"
Tues,"S: 3h0s, P: 18m0s"
Wed,"S: 4h0s, P: 18m0s"
Thurs,"S: 30h, P: 10m0s"
Fri,"S: 15m, P: 3h0s"

I want to split that duration into two distinct columns and consistently represent the time in minutes. Right now, it is shown in hours, minutes, and seconds, like S: 3h0s, P: 18m0s. So the output should look like this:

    Day             Duration   S(min) P(min)
0   Mon    S: 3h0s, P: 18m0s     180     18
1  Tues    S: 3h0s, P: 18m0s     180     18
2   Wed    S: 4h0s, P: 18m0s     240     18
3  Thur   S: 30h0s, P: 10m0s    1800     10
4   Fri      S: 15m, P: 3h0s      15    180

But when I do in str.replace

import pandas as pd

df = pd.read_csv("/file.csv")
df["S(min)"] = df['Duration'].str.split(',').str[0]
df["P(min)"] = df['Duration'].str.split(',').str[-1]

df['S(min)'] = df['S(min)'].str.replace("S: ", '').str.replace("h", '*60').str.replace('m','*1').str.replace('s','*(1/60)').apply(eval)
df['P(min)'] = df['P(min)'].str.replace("P: ", '').str.replace("h", '*60').str.replace('m','*1').str.replace('s','*(1/60)').apply(eval)

some of the calculations are off:

     Day           Duration  S(min)     P(min)
0    Mon  S: 3h0s, P: 18m0s    30.0   3.000000
1   Tues  S: 3h0s, P: 18m0s    30.0   3.000000
2    Wed  S: 4h0s, P: 18m0s    40.0   3.000000
3  Thurs   S: 30h, P: 10m0s  1800.0   1.666667
4    Fri    S: 15m, P: 3h0s    15.0  30.000000

Solution

  • Using regex and pd.to_timedelta

    df[['S', 'P']] = df['Duration'].str.extract(r'(S: .*?), P:( .*)')
    df['S(min)'] = pd.to_timedelta(df['Duration'].str.replace('[SP]: ', '', regex=True).str.split(',').str[0]).dt.total_seconds() / 60
    df['P(min)'] = pd.to_timedelta(df['Duration'].str.replace('[SP]: ', '', regex=True).str.split(',').str[-1]).dt.total_seconds() / 60
    
    df.drop(['S', 'P'], axis=1, inplace=True)
    print(df)
    

    More Simplified approach with different regex pattern:

    df[['S', 'P']] = df['Duration'].str.extract(r'S: (.*?), P: (.*)')
    df['S'] = pd.to_timedelta(df['S']).dt.total_seconds()/60
    df['P'] = pd.to_timedelta(df['P']).dt.total_seconds()/60
    
    df = df.rename(columns = {'S': 'S(min)', 'P': 'P(min)'})
    print(df)
    

        Day            Duration  S(min)  P(min)
    0   Mon   S: 3h0s, P: 18m0s   180.0    18.0
    1  Tues   S: 3h0s, P: 18m0s   180.0    18.0
    2   Wed   S: 4h0s, P: 18m0s   240.0    18.0
    3  Thur  S: 30h0s, P: 10m0s  1800.0    10.0
    4   Fri     S: 15m, P: 3h0s    15.0   180.0