Search code examples
pythonpandasdataframesplit

TypeError: 'return_type' is an invalid keyword argument for split()


I have a df with start and end time columns. Since these columns might have gibberish values, I have put in try and except blocks. I'm trying to pad the times to make them consistent and then finally save them as pandas datetime.time values. Here's the code:

 for i in range(df.shape[0]):
        try:
            df.loc[i,'start time'] = pd.to_datetime(df.loc[i,'start time'].split(':', expand=True)
                                                     .apply(lambda col: col.str.zfill(2))
                                                     .fillna('00')
                                                     .agg(':'.join, axis=1)).dt.time
        except:
            pass
        try:
            df.loc[i,'end time'] = pd.to_datetime(df.loc[i,'end time'].str.split(':', expand=True)
                                                     .apply(lambda col: col.str.zfill(2))
                                                     .fillna('00')
                                                     .agg(':'.join, axis=1)).dt.time
        except:
            pass

But this piece of code gives an error: TypeError: 'expand' is an invalid keyword argument for split()

What am I missing here?


Solution

  • You are confusing pd.Series.str.split and str.split. In your case you are splitting a string not the series because you are iterating through the elements one by one

    >>> '12:32:28'.split(':')
    ['12', '32', '28']
    
    >>> '12:32:28'.split(':', expand=True)
    ...
    TypeError: 'expand' is an invalid keyword argument for split()
    
    
    >>> df['start_time'].str.split(':')
    0      [2, 3, 4]
    1     [2, 5, 55]
    2     [2, 8, 46]
    3    [2, 11, 37]
    4    [2, 14, 28]
    Name: start_time, dtype: object
    
    >>> df['start_time'].str.split(':', expand=True)
       0   1   2
    0  2   3   4
    1  2   5  55
    2  2   8  46
    3  2  11  37
    4  2  14  28
    

    I think your code could be simply (without any loop)

    >>> pd.to_datetime(df['start_time'], format='%H:%M:%S').dt.time
    0    02:03:04
    1    02:05:55
    2    02:08:46
    3    02:11:37
    4    02:14:28
    Name: start_time, dtype: object
    

    Input dataframe:

    >>> df
      start_time
    0      2:3:4
    1     2:5:55
    2     2:8:46
    3    2:11:37
    4    2:14:28