Search code examples
pythonpandasdurationiso8601isodate

Convert Pandas Column of ISO 8601 Data to Seconds


I'm trying to convert a pandas dataframe column of ISO 8601 duration data to total seconds. The duration values look like PT7M7S which would mean 7 minutes and 7 seconds. isodate 0.5.4 is effective in parsing one string if I use something like isodate.parse_duration("PT7M7S"). However, I need to run that same command on a pandas column and have no idea how. I tried isodate.parse_duration(df2['duration']) but it returns a TypeError: Expecting a string.

The following code creates a test dataframe:

df = ["PT7M7S", "PT7M14S", "PT6M45S"]
df = pd.DataFrame.from_dict(df)
names = df.columns.tolist()
names[names.index(0)] = 'duration'
df.columns = names

This is the code I tried but was not effective:

import isodate
dur = isodate.parse_duration(df['duration'])
df['duration'] = dur.total_seconds()

Ideal output would be to have the column duration consist of the total seconds corresponding to that row. So for example, instead of the first row having PT7M7S it would read 427.

I appreciate any help. Thanks.


Solution

  • You could use str.extract to parse the strings:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'duration': ["PT7M7S", "PT7M14S", "PT6M45S"]})
    df[['minutes','seconds']] = df['duration'].str.extract(r'PT(\d+)M(\d+)S', expand=True).astype('int')
    df['total_seconds'] = 60*df['minutes'] + df['seconds']
    

    yields

      duration  minutes  seconds  total_seconds
    0   PT7M7S        7        7            427
    1  PT7M14S        7       14            434
    2  PT6M45S        6       45            405