I am attempting to find the change in time within a specific session (my index)--my data frame looks like this:
time
sess_id vis_id
id1 vis_id1 t_0
vis_id1 t_1
vis_id1 t_2
id2 vis_id2 t_0
vis_id2 t_1
vis_id2 t_2
I want to create a column called delta_t
(change in time) which subtracts the time stamps recursively--where the last time for every session contains filler character like a dash or something
time delta_t
sess_id vis_id
id1 vis_id1 t_0 (t_1 - t_0)
vis_id1 t_1 (t_2 - t_1)
vis_id1 t_2 -
id2 vis_id2 t_3 (t_4 - t_3)
vis_id2 t_4 (t_5 - t_4)
vis_id2 t_5 -
We can groupby shift
relative to level=0
or level="sess_id"
to get the next row's value, and subtract from time
:
df['delta_t'] = df.groupby(level='sess_id')['time'].shift(-1) - df['time']
Sample DataFrame and output:
time delta_t
sess_id vis_id
id1 vis_id1 2021-01-11 00:00:00 6 days 04:27:31
vis_id1 2021-01-17 04:27:31 4 days 03:45:26
vis_id1 2021-01-21 08:12:57 NaT
id2 vis_id2 2021-01-28 15:18:32 7 days 17:57:56
vis_id2 2021-02-05 09:16:28 4 days 01:41:58
vis_id2 2021-02-09 10:58:26 NaT
We could groupby diff
then groupby shift
but this involves 2 groupbys:
df['delta_t'] = (
df.groupby(level='sess_id')['time'].diff()
.groupby(level='sess_id').shift(-1)
)
If needing '-' over NaT
np.where
can be used to covert Timedelta to string and fillna with '-':
# Calculate Delta
df['delta_t'] = df.groupby(level='sess_id')['time'].shift(-1) - df['time']
# Change dtype and add in '-'
df['delta_t'] = np.where(df['delta_t'].notna(), df['time'].astype(str), '-')
Alternatively can convert to str
and replace
"NaT" with "-":
# Calculate Delta, convert to String, replace "NaT" with "-"
df['delta_t'] = (
df.groupby(level='sess_id')['time'].shift(-1) - df['time']
).astype(str).replace('NaT', '-')
df
:
time delta_t
sess_id vis_id
id1 vis_id1 2021-01-11 00:00:00 6 days 04:27:31
vis_id1 2021-01-17 04:27:31 4 days 03:45:26
vis_id1 2021-01-21 08:12:57 -
id2 vis_id2 2021-01-28 15:18:32 7 days 17:57:56
vis_id2 2021-02-05 09:16:28 4 days 01:41:58
vis_id2 2021-02-09 10:58:26 -
DataFrame constructor and imports:
import pandas as pd
df = pd.DataFrame(
{'time': pd.to_datetime(['2021-01-11 00:00:00', '2021-01-17 04:27:31',
'2021-01-21 08:12:57', '2021-01-28 15:18:32',
'2021-02-05 09:16:28', '2021-02-09 10:58:26'])},
index=pd.MultiIndex.from_arrays((['id1'] * 3 + ['id2'] * 3,
['vis_id1'] * 3 + ['vis_id2'] * 3),
names=['sess_id', 'vis_id'])
)