Search code examples
pandasdatetimerounding

Pandas df round datetime with consistent timedelta


How can i round to the closest full minute without getting duplicates. The data the i have does not have a 100% accurate time delta between the entrys.

E.g. i have

data1 2023-12-10T20:01:00.58Z

data2 2023-12-10T20:02:00.58Z

data3 2023-12-10T20:02:59.62Z

When rounding down here, we get 20:01, 20:02, 20:02

So why not rounding to the closest minute? Because there can be timestamps like:
20:01:29, 20:02:29, 20:03:31, 20:04:31 and this would lead to 20:01, 20:02, 20:04, 20:05.

But i want consistency over the time delta in the result.

I always know that for one data series i have, the timestamps always center around a second. That mean, that there can not be somethink like the first end the second example together.

For playing around:

import pandas as pd
import plotly.express as px

# Creating a pandas DataFrame
data = {'data1': ['2023-12-10T20:01:00.58Z'],
        'data2': ['2023-12-10T20:02:00.58Z'],
        'data3': ['2023-12-10T20:02:59.62Z']}

df = pd.DataFrame(data)

# Converting the string columns to datetime
df['data1'] = pd.to_datetime(df['data1'])
df['data2'] = pd.to_datetime(df['data2'])
df['data3'] = pd.to_datetime(df['data3'])

I want a to round the datetime field, while keeping the delta of 1 minute.


Solution

  • IIUC, you can compute the diff of successive values, then round and cumsum and add to the rounded first value:

    delta = df.iloc[0].diff().fillna('0').dt.round(freq='1min').cumsum()
    df.iloc[0] = delta.add(df.iat[0, 0].round('1min'))
    

    NB. this is assuming that the values are sorted. If not, use .sort_values().diff() instead of .diff().

    If you know the data is sorted and that the successive deltas are always close to 1min:

    df.iloc[0] = (pd.TimedeltaIndex(range(df.shape[1]), unit='min')
                  + df.iat[0, 0].round('1min')
                 )
    

    Output:

                          data1                     data2                     data3
    0 2023-12-10 20:01:00+00:00 2023-12-10 20:02:00+00:00 2023-12-10 20:03:00+00:00