Search code examples
pythonregexpandasdataframeeval

How can I convert transform column in Pandas using regex


I have got a time column like this:

df = pd.DataFrame({'time': {0: '1 h 50 min', 1: '50 min', 2: '2 h 3 min', 3: '3 min'}})
        time
0  1 h 50 min
1      50 min
2   2 h 3 min
3       3 min

I want the information in seconds like:

time
6600
3000
7380
180

I was trying to use regex and eval to evaluate the time in seconds but was not successful what's the best possible way to achieve this?


Solution

  • First use the regex to get hr and minute, then convert both to minutes and calculate the sum, finally multiply the result by 60 to get the seconds.

    df['time'].str.findall('(\d+)\s*(h|min)').apply(lambda x: sum(int(t)*60 if u=='h' else int(t) for t,u in x))*60
    

    OUTPUT:

    0    6600
    1    3000
    2    7380
    3     180
    Name: time, dtype: int64
    

    If you want, you can just have separate columns of hour and minute:

    out = df.assign(hours=df['time'].str.extract('(\d+)\s*h'),
              minutes=df['time'].str.extract('(\d+)\s*min')).fillna(0)
    
             time hours minutes
    0  1 h 50 min     1      50
    1      50 min     0      50
    2   2 h 3 min     2       3
    3       3 min     0       3
    

    Now , you can just convert them to integer and get the number of seconds:

    out['hours'].astype(int)*3600+out['minutes'].astype(int)*60
    
    0    6600
    1    3000
    2    7380
    3     180
    dtype: int32