I have a dataframe like so:
category name age
parent harry 29
child smith 12
parent sally 41
child david 19
child mike 16
And I want to add a column to group families based on each occurence of category column value 'parent' (the dataframe is in order). As in:
category name age family_id
parent harry 29 0
child smith 12 0
parent sally 41 1
child david 19 1
child mike 16 1
I am trying to make the family_id be an incrementing integer.
I've tried a bunch of group_by and am currently trying to write my own apply function but its very slow and not working as expected. I haven't been able to find an example that groups rows based on a column value over every occurence of the same value.
You can use eq
to match if category
column equals parent
and cumsum
, sub
is to subtract 1 since cumsum starts from 1 here:
df['family_id'] = df['category'].eq('parent').cumsum().sub(1)
print(df)
category name age family_id
0 parent harry 29 0
1 child smith 12 0
2 parent sally 41 1
3 child david 19 1
4 child mike 16 1