I have a dataframe with two columns: Category and Datetime
I want to create a new column that shows the difference between the Datetime of the current row vs the previous row, restarting at each category.
What I have:
Category Datetime
A 2018-02-01 01:51:04
A 2018-02-01 02:04:04
B 2018-02-01 02:28:34
B 2018-02-01 02:41:34
B 2018-02-01 02:45:34
What I want:
Category Datetime Difference
A 2018-02-01 01:51:04 NaT
A 2018-02-01 02:04:04 00:13:00
B 2018-02-01 02:28:34 NaT
B 2018-02-01 02:41:34 00:13:00
B 2018-02-01 02:45:34 00:04:00
EDIT:
@sacul I tried your solution of doing df['Difference'] = list(by_group.apply(lambda x: x['Datetime']-x['Datetime'].shift()))
but it's giving me weird results...here's the actual data I'm working with:
Category Datetime Difference
A 2/1/18 1:51 NaT
A 2/1/18 2:04 1 days 02:52:00
B 2/1/18 2:28 NaT
C 2/1/18 2:41 NaT
D 2/1/18 6:31 0 days 00:10:30
E 2/1/18 8:26 3 days 23:19:30
F 2/1/18 10:03 0 days 00:21:00
G 2/1/18 11:11 NaT
G 2/1/18 11:11 NaT
G 2/1/18 11:11 0 days 00:00:02
G 2/1/18 11:11 0 days 00:02:30
H 2/1/18 11:12 0 days 00:00:02
H 2/1/18 11:22 0 days 00:02:28
I 2/1/18 15:26 0 days 00:00:02
I 2/1/18 16:01 0 days 00:08:26
I 2/1/18 17:26 0 days 00:00:01
J 2/1/18 17:42 0 days 00:01:31
J 2/1/18 17:42 NaT
alternative solution
import pandas as pd
import numpy as np
df.DateTime = pd.to_datetime(df.DateTime)
df['Difference'] = np.where(df.Category == df.Category.shift(), df.DateTime - df.DateTime.shift(), np.nan)
note: this only works if your data is presorted