Search code examples
pythonpandasdatetimedataframe

How to remove seconds from datetime?


I have the following date and I tried the following code,

df['start_date_time'] = [
    "2016-05-19 08:25:00", "2016-05-19 16:00:00", "2016-05-20 07:45:00",
    "2016-05-24 12:50:00", "2016-05-25 23:00:00", "2016-05-26 19:45:00"]
df['start_date_time'] = pd.to_datetime([df['start_date_time']).replace(second=0)

I get the following error:

TypeError: replace() got an unexpected keyword argument 'second'

Solution

  • Solutions if need datetimes in output:

    df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
    df['start_date_time'] = pd.to_datetime(df['start_date_time'])
    print (df)
           start_date_time
    0  2016-05-19 08:25:23
    1  2016-05-19 16:00:45
    

    Use Series.dt.floor by minutes T or Min:

    df['start_date_time'] = df['start_date_time'].dt.floor('T')
    
    df['start_date_time'] = df['start_date_time'].dt.floor('Min')
    

    You can use convert to numpy values first and then truncate seconds by cast to <M8[m], but this solution remove possible timezones:

    df['start_date_time'] = df['start_date_time'].values.astype('<M8[m]')
    print (df)
          start_date_time
    0 2016-05-19 08:25:00
    1 2016-05-19 16:00:00
    

    Another solution is create timedelta Series from second and substract:

    print (pd.to_timedelta(df['start_date_time'].dt.second, unit='s'))
    0   00:00:23
    1   00:00:45
    Name: start_date_time, dtype: timedelta64[ns]
    
    df['start_date_time'] = df['start_date_time'] - 
                            pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
    print (df)
          start_date_time
    0 2016-05-19 08:25:00
    1 2016-05-19 16:00:00
    

    Timings:

    df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
    df['start_date_time'] = pd.to_datetime(df['start_date_time'])
    
    #20000 rows
    df = pd.concat([df]*10000).reset_index(drop=True)
    
    
    In [28]: %timeit df['start_date_time'] = df['start_date_time'] - pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
    4.05 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    In [29]: %timeit df['start_date_time1'] = df['start_date_time'].values.astype('<M8[m]')
    1.73 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [30]: %timeit df['start_date_time'] = df['start_date_time'].dt.floor('T')
    1.07 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [31]: %timeit df['start_date_time2'] = df['start_date_time'].apply(lambda t: t.replace(second=0))
    183 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    Solutions if need strings repr of datetimes in output

    Use Series.dt.strftime:

    print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M'))
    0    2016-05-19 08:25
    1    2016-05-19 16:00
    Name: start_date_time, dtype: object
    

    And if necessary set :00 to seconds:

    print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M:00'))
    0    2016-05-19 08:25:00
    1    2016-05-19 16:00:00
    Name: start_date_time, dtype: object