I'm trying to convert a pandas dataframe column of ISO 8601 duration data to total seconds. The duration values look like PT7M7S
which would mean 7 minutes and 7 seconds. isodate 0.5.4 is effective in parsing one string if I use something like isodate.parse_duration("PT7M7S")
. However, I need to run that same command on a pandas column and have no idea how. I tried isodate.parse_duration(df2['duration'])
but it returns a TypeError: Expecting a string
.
df = ["PT7M7S", "PT7M14S", "PT6M45S"]
df = pd.DataFrame.from_dict(df)
names = df.columns.tolist()
names[names.index(0)] = 'duration'
df.columns = names
import isodate
dur = isodate.parse_duration(df['duration'])
df['duration'] = dur.total_seconds()
Ideal output would be to have the column duration
consist of the total seconds corresponding to that row. So for example, instead of the first row having PT7M7S
it would read 427
.
I appreciate any help. Thanks.
You could use str.extract
to parse the strings:
import numpy as np
import pandas as pd
df = pd.DataFrame({'duration': ["PT7M7S", "PT7M14S", "PT6M45S"]})
df[['minutes','seconds']] = df['duration'].str.extract(r'PT(\d+)M(\d+)S', expand=True).astype('int')
df['total_seconds'] = 60*df['minutes'] + df['seconds']
yields
duration minutes seconds total_seconds
0 PT7M7S 7 7 427
1 PT7M14S 7 14 434
2 PT6M45S 6 45 405