Search code examples
pythonpandasdataframecumsum

Pandas - Groupby with cumsum or cumcount


I have the following dataframe:

       Vela    FlgVela  
0         R      0
1         V      1      
2         V      1      
3         R      1      
4         R      1
5         V      0
6         R      1
7         R      1
8         R      1

What is the best way to get the result of the dataframe below?

       Vela    FlgVela  AddCol
0         R      0         1
1         V      1         2
2         V      1         2
3         R      1         3
4         R      1         3
5         V      0         4
6         R      1         5
7         R      1         5
8         R      1         5
  

I have tried the following logic but the result is not what I expected.

df['AddCol'] = df.groupby(df['Vela'].astype(str).str.strip() !=  df['Vela'].shift(-1).astype(str).str.strip() ).cumcount()+1

Solution

  • I think you're close, here is one way:

    df["AddCol"] = df.groupby("Vela").ngroup().diff().ne(0).cumsum()
    

    where we first get the group number each distinct Vela belongs to (kind of factorize) then take the first differences and see if they are not equal to 0. This will sort of give the "turning" points from one group to another. Then we cumulatively sum them,

    to get

    >>> df
    
      Vela  FlgVela  AddCol
    0    R        0       1
    1    V        1       2
    2    V        1       2
    3    R        1       3
    4    R        1       3
    5    V        0       4
    6    R        1       5
    7    R        1       5
    8    R        1       5