I have the following date and I tried the following code,
df['start_date_time'] = [
"2016-05-19 08:25:00", "2016-05-19 16:00:00", "2016-05-20 07:45:00",
"2016-05-24 12:50:00", "2016-05-25 23:00:00", "2016-05-26 19:45:00"]
df['start_date_time'] = pd.to_datetime([df['start_date_time']).replace(second=0)
I get the following error:
TypeError: replace() got an unexpected keyword argument 'second'
Solutions if need datetimes in output:
df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])
print (df)
start_date_time
0 2016-05-19 08:25:23
1 2016-05-19 16:00:45
Use Series.dt.floor
by minutes T
or Min
:
df['start_date_time'] = df['start_date_time'].dt.floor('T')
df['start_date_time'] = df['start_date_time'].dt.floor('Min')
You can use convert to numpy values
first and then truncate seconds
by cast to <M8[m]
, but this solution remove possible timezones:
df['start_date_time'] = df['start_date_time'].values.astype('<M8[m]')
print (df)
start_date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
Another solution is create timedelta
Series from second
and substract:
print (pd.to_timedelta(df['start_date_time'].dt.second, unit='s'))
0 00:00:23
1 00:00:45
Name: start_date_time, dtype: timedelta64[ns]
df['start_date_time'] = df['start_date_time'] -
pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
print (df)
start_date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
Timings:
df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])
#20000 rows
df = pd.concat([df]*10000).reset_index(drop=True)
In [28]: %timeit df['start_date_time'] = df['start_date_time'] - pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
4.05 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [29]: %timeit df['start_date_time1'] = df['start_date_time'].values.astype('<M8[m]')
1.73 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [30]: %timeit df['start_date_time'] = df['start_date_time'].dt.floor('T')
1.07 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [31]: %timeit df['start_date_time2'] = df['start_date_time'].apply(lambda t: t.replace(second=0))
183 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Solutions if need strings repr of datetimes in output
Use Series.dt.strftime
:
print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M'))
0 2016-05-19 08:25
1 2016-05-19 16:00
Name: start_date_time, dtype: object
And if necessary set :00
to seconds:
print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M:00'))
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
Name: start_date_time, dtype: object