I have a df with start and end time columns. Since these columns might have gibberish values, I have put in try and except blocks. I'm trying to pad the times to make them consistent and then finally save them as pandas datetime.time values. Here's the code:
for i in range(df.shape[0]):
try:
df.loc[i,'start time'] = pd.to_datetime(df.loc[i,'start time'].split(':', expand=True)
.apply(lambda col: col.str.zfill(2))
.fillna('00')
.agg(':'.join, axis=1)).dt.time
except:
pass
try:
df.loc[i,'end time'] = pd.to_datetime(df.loc[i,'end time'].str.split(':', expand=True)
.apply(lambda col: col.str.zfill(2))
.fillna('00')
.agg(':'.join, axis=1)).dt.time
except:
pass
But this piece of code gives an error: TypeError: 'expand' is an invalid keyword argument for split()
What am I missing here?
You are confusing pd.Series.str.split
and str.split
. In your case you are splitting a string not the series because you are iterating through the elements one by one
>>> '12:32:28'.split(':')
['12', '32', '28']
>>> '12:32:28'.split(':', expand=True)
...
TypeError: 'expand' is an invalid keyword argument for split()
>>> df['start_time'].str.split(':')
0 [2, 3, 4]
1 [2, 5, 55]
2 [2, 8, 46]
3 [2, 11, 37]
4 [2, 14, 28]
Name: start_time, dtype: object
>>> df['start_time'].str.split(':', expand=True)
0 1 2
0 2 3 4
1 2 5 55
2 2 8 46
3 2 11 37
4 2 14 28
I think your code could be simply (without any loop)
>>> pd.to_datetime(df['start_time'], format='%H:%M:%S').dt.time
0 02:03:04
1 02:05:55
2 02:08:46
3 02:11:37
4 02:14:28
Name: start_time, dtype: object
Input dataframe:
>>> df
start_time
0 2:3:4
1 2:5:55
2 2:8:46
3 2:11:37
4 2:14:28