Search code examples
pythondataframegroup-byshiftsubtraction

Subtracting datetime value from previous row in pandas dataframe


I have a dataframe with two columns: Category and Datetime

I want to create a new column that shows the difference between the Datetime of the current row vs the previous row, restarting at each category.

What I have:

Category    Datetime
A           2018-02-01 01:51:04
A           2018-02-01 02:04:04
B           2018-02-01 02:28:34 
B           2018-02-01 02:41:34
B           2018-02-01 02:45:34    

What I want:

Category    Datetime               Difference
A           2018-02-01 01:51:04    NaT
A           2018-02-01 02:04:04    00:13:00
B           2018-02-01 02:28:34    NaT
B           2018-02-01 02:41:34    00:13:00
B           2018-02-01 02:45:34    00:04:00

EDIT:

@sacul I tried your solution of doing df['Difference'] = list(by_group.apply(lambda x: x['Datetime']-x['Datetime'].shift())) but it's giving me weird results...here's the actual data I'm working with:

Category    Datetime        Difference
A           2/1/18 1:51     NaT
A           2/1/18 2:04     1 days 02:52:00
B           2/1/18 2:28     NaT
C           2/1/18 2:41     NaT
D           2/1/18 6:31     0 days 00:10:30
E           2/1/18 8:26     3 days 23:19:30
F           2/1/18 10:03    0 days 00:21:00
G           2/1/18 11:11    NaT
G           2/1/18 11:11    NaT
G           2/1/18 11:11    0 days 00:00:02
G           2/1/18 11:11    0 days 00:02:30
H           2/1/18 11:12    0 days 00:00:02
H           2/1/18 11:22    0 days 00:02:28
I           2/1/18 15:26    0 days 00:00:02
I           2/1/18 16:01    0 days 00:08:26
I           2/1/18 17:26    0 days 00:00:01
J           2/1/18 17:42    0 days 00:01:31
J           2/1/18 17:42    NaT

Solution

  • alternative solution

    import pandas as pd
    import numpy as np
    df.DateTime = pd.to_datetime(df.DateTime)
    
    
    df['Difference'] = np.where(df.Category == df.Category.shift(), df.DateTime - df.DateTime.shift(), np.nan)
    

    note: this only works if your data is presorted