How can i round to the closest full minute without getting duplicates. The data the i have does not have a 100% accurate time delta between the entrys.
E.g. i have
data1 2023-12-10T20:01:00.58Z
data2 2023-12-10T20:02:00.58Z
data3 2023-12-10T20:02:59.62Z
When rounding down here, we get 20:01, 20:02, 20:02
So why not rounding to the closest minute? Because there can be timestamps like:
20:01:29, 20:02:29, 20:03:31, 20:04:31 and this would lead to 20:01, 20:02, 20:04, 20:05.
But i want consistency over the time delta in the result.
I always know that for one data series i have, the timestamps always center around a second. That mean, that there can not be somethink like the first end the second example together.
For playing around:
import pandas as pd
import plotly.express as px
# Creating a pandas DataFrame
data = {'data1': ['2023-12-10T20:01:00.58Z'],
'data2': ['2023-12-10T20:02:00.58Z'],
'data3': ['2023-12-10T20:02:59.62Z']}
df = pd.DataFrame(data)
# Converting the string columns to datetime
df['data1'] = pd.to_datetime(df['data1'])
df['data2'] = pd.to_datetime(df['data2'])
df['data3'] = pd.to_datetime(df['data3'])
I want a to round the datetime field, while keeping the delta of 1 minute.
IIUC, you can compute the diff
of successive values, then round
and cumsum
and add to the rounded first value:
delta = df.iloc[0].diff().fillna('0').dt.round(freq='1min').cumsum()
df.iloc[0] = delta.add(df.iat[0, 0].round('1min'))
NB. this is assuming that the values are sorted. If not, use .sort_values().diff()
instead of .diff()
.
If you know the data is sorted and that the successive deltas are always close to 1min:
df.iloc[0] = (pd.TimedeltaIndex(range(df.shape[1]), unit='min')
+ df.iat[0, 0].round('1min')
)
Output:
data1 data2 data3
0 2023-12-10 20:01:00+00:00 2023-12-10 20:02:00+00:00 2023-12-10 20:03:00+00:00