Search code examples
pythonpandasgroup-by

Get consecutive occurrences of an event by group in pandas


I'm working with a DataFrame that has id, wage and date, like this:

id   wage   date
1    100    201212
1    100    201301             
1     0     201302
1     0     201303
1    120    201304
1     0     201305
      .
2     0     201302
2     0     201303

And I want to create a n_months_no_income column that counts how many consecutive months a given individual has got wage==0, like this:

id   wage   date     n_months_no_income
1    100    201212             0
1    100    201301             0
1     0     201302             1
1     0     201303             2
1    120    201304             0
1     0     201305             1
      .                        .
2     0     201302             1
2     0     201303             2

I feel it's some sort of mix between groupby('id') , cumcount(), maybe diff() or apply() and then a fillna(0), but I'm not finding the right one.

Do you have any ideas?

Here's an example for the dataframe for ease of replication:

df = pd.DataFrame({'id':[1,1,1,1,1,1,2,2],'wage':[100,100,0,0,120,0,0,0],
 'date':[201212,201301,201302,201303,201304,201305,201302,201303]})

Edit: Added code for ease of use.


Solution

  • In your case two groupby with cumcount and create the addtional key with cumsum

    df.groupby('id').wage.apply(lambda x : x.groupby(x.ne(0).cumsum()).cumcount())
    Out[333]: 
    0    0
    1    0
    2    1
    3    2
    4    0
    5    1
    Name: wage, dtype: int64