I have a dataframe like below :
id_ = ['A','A','A','A','A','B','B','B','B']
days = [1,10,20,25,100,10,20,30,50]
event = [0,0,0,1,0,0,0,1,0]
df= pd.DataFrame({'id':id_,'days':days,'event':event})
Input:
id days event
0 A 1 0
1 A 10 0
2 A 20 0
3 A 25 1
4 A 100 0
5 B 10 0
6 B 20 0
7 B 30 1
8 B 50 0
I want to make a group around event = 1 :
I want to group the rows within 30 days of the row with event = 1, per id.
I expect the output below (without for loop):
id days event group
0 A 1 0 1.0
1 A 10 0 1.0
2 A 20 0 1.0
3 A 25 1 1.0
4 A 100 0 NaN
5 B 10 0 2.0
6 B 20 0 2.0
7 B 30 1 2.0
8 B 50 0 2.0
You can use a merge_asof
:
df['group'] = (pd.merge_asof(
df.reset_index().sort_values(by='days'),
df[df['event'].eq(1)].sort_values(by='days'),
by='id', on='days',
direction='nearest', tolerance=29)
.set_index('index')['event_y']
)
df['group'] *= pd.factorize(df['id'])[0]+1
Output:
id days event group
0 A 1 0 1.0
1 A 10 0 1.0
2 A 20 0 1.0
3 A 25 1 1.0
4 A 100 0 NaN
5 B 10 0 2.0
6 B 20 0 2.0
7 B 30 1 2.0
8 B 50 0 2.0