I have a dataframe as follows:
age_start | age_end |
---|---|
2 | 6 |
6 | 10 |
11 | 16 |
17 | 18 |
21 | 25 |
27 | 30 |
30 | 34 |
I want to aggregate successive rows where the values between the age_end overlap with the subsequent age_start value. For example, the first two rows would be collapsed because 6 is the overlapping value amongst them. The last two rows would also be collapsed because the overlapping value is 30. The goal is to create broader age groups and scale it so the function can aggregate any number of successive rows and not just pairs. The desired output is:
age_start | age_end |
---|---|
2 | 10 |
11 | 16 |
17 | 18 |
21 | 25 |
27 | 34 |
# Mark transitions:
df.loc[df.age_start.gt(df.age_end.shift(1)), 'group'] = 1
# Create Groups:
df['group'] = df['group'].cumsum().ffill().fillna(0)
# Extract start/stop point from groups:
out = df.groupby('group').agg({'age_start':'min', 'age_end':'max'}).reset_index(drop=True)
print(out)
Output:
age_start age_end
0 2 10
1 11 16
2 17 18
3 21 25
4 27 34