Search code examples
pythonpandaspandas-groupbypandasql

Add index in pandas based on each occurance of another column specific value


I have a dataframe like so:

category name   age 
parent  harry   29
child   smith   12
parent  sally   41
child   david   19
child   mike    16

And I want to add a column to group families based on each occurence of category column value 'parent' (the dataframe is in order). As in:

category name   age  family_id
parent  harry   29     0
child   smith   12     0
parent  sally   41     1
child   david   19     1
child   mike    16     1

I am trying to make the family_id be an incrementing integer.

I've tried a bunch of group_by and am currently trying to write my own apply function but its very slow and not working as expected. I haven't been able to find an example that groups rows based on a column value over every occurence of the same value.


Solution

  • You can use eq to match if category column equals parent and cumsum , sub is to subtract 1 since cumsum starts from 1 here:

    df['family_id'] = df['category'].eq('parent').cumsum().sub(1)
    print(df)
    

      category   name  age  family_id
    0   parent  harry   29          0
    1    child  smith   12          0
    2   parent  sally   41          1
    3    child  david   19          1
    4    child   mike   16          1