How to remove seconds from datetime?

I have the following date and I tried the following code,

df['start_date_time'] = [
    "2016-05-19 08:25:00", "2016-05-19 16:00:00", "2016-05-20 07:45:00",
    "2016-05-24 12:50:00", "2016-05-25 23:00:00", "2016-05-26 19:45:00"]
df['start_date_time'] = pd.to_datetime([df['start_date_time']).replace(second=0)

I get the following error:

TypeError: replace() got an unexpected keyword argument 'second'

Solution

Solutions if need datetimes in output:

df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])
print (df)
       start_date_time
0  2016-05-19 08:25:23
1  2016-05-19 16:00:45

Use Series.dt.floor by minutes T or Min:

df['start_date_time'] = df['start_date_time'].dt.floor('T')

df['start_date_time'] = df['start_date_time'].dt.floor('Min')

You can use convert to numpy values first and then truncate seconds by cast to <M8[m], but this solution remove possible timezones:

df['start_date_time'] = df['start_date_time'].values.astype('<M8[m]')
print (df)
      start_date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00

Another solution is create timedelta Series from second and substract:

print (pd.to_timedelta(df['start_date_time'].dt.second, unit='s'))
0   00:00:23
1   00:00:45
Name: start_date_time, dtype: timedelta64[ns]

df['start_date_time'] = df['start_date_time'] - 
                        pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
print (df)
      start_date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00

Timings:

df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])

#20000 rows
df = pd.concat([df]*10000).reset_index(drop=True)


In [28]: %timeit df['start_date_time'] = df['start_date_time'] - pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
4.05 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [29]: %timeit df['start_date_time1'] = df['start_date_time'].values.astype('<M8[m]')
1.73 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [30]: %timeit df['start_date_time'] = df['start_date_time'].dt.floor('T')
1.07 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [31]: %timeit df['start_date_time2'] = df['start_date_time'].apply(lambda t: t.replace(second=0))
183 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Solutions if need strings repr of datetimes in output

Use Series.dt.strftime:

print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M'))
0    2016-05-19 08:25
1    2016-05-19 16:00
Name: start_date_time, dtype: object

And if necessary set :00 to seconds:

print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M:00'))
0    2016-05-19 08:25:00
1    2016-05-19 16:00:00
Name: start_date_time, dtype: object