Search code examples
pythonpandasdataframeseries

Is .diff(period=-10) working on pandas series?


I have a dataframe like so:

import pandas as pd
import numpy as np

date_rng = pd.date_range(start="2023-11-18", periods=3, freq="10S")
values = [4, 2, 3]
df = pd.DataFrame(data={"values": values}, index=date_rng)
df["dt"] = df.index.to_series().diff().dt.seconds
df["dt"] = df.index.to_series().diff(periods=2).dt.seconds
df["dt_neg"] = df.index.to_series().diff(periods=-1).dt.seconds
print(df)

gives

                     values    dt   dt_neg
2023-11-18 00:00:00       4   NaN  86390.0
2023-11-18 00:00:10       2   NaN  86390.0
2023-11-18 00:00:20       3  20.0      NaN

Shouldn't negative values work, too?

I read the answers here and here.


Solution

  • seconds return the number of seconds independently of the hours/minutes. You need to use total_seconds:

    df["dt_neg"] = df.index.to_series().diff(periods=-1).dt.total_seconds()
    

    Output:

                         values    dt  dt_neg
    2023-11-18 00:00:00       4   NaN   -10.0
    2023-11-18 00:00:10       2   NaN   -10.0
    2023-11-18 00:00:20       3  20.0     NaN
    

    Indeed, negative timedelta have a peculiar format:

    df.index.to_series().diff(periods=-1)
    
    2023-11-18 00:00:00   -1 days +23:59:50
    2023-11-18 00:00:10   -1 days +23:59:50
    2023-11-18 00:00:20                 NaT
    Freq: 10S, dtype: timedelta64[ns]