Search code examples
pythonpandasgroup-by

Count consecutive boolean values in Python/pandas array for whole subset


I am looking for a way to aggregate pandas data frame by consecutive same values and perform actions like count or max on this aggregation.

for example, if I would have one column in df:

    my_column
0        0  
1        0  
2        1  
3        1  
4        1  
5        0  
6        0  
7        0  
8        0  
9        1  
10       1  
11       0

the result needs to be:

    result
0        2  
1        2  
2        3  
3        3  
4        3  
5        4  
6        4  
7        4  
8        4  
9        2  
10       2  
11       1

Why: We have two 0 at the beginning, and three 1 next,...

What I need, is similar that this answer but for all elements in the group I need the same value.

The preferred answer would be one that shows this aggregation of the consecutive same element and applies the aggregation function to it. So that I could do even max value:

    my_column    other_value
0        0           7
1        0           4
2        1           1
3        1           0
4        1           5
5        0           1
6        0           1
7        0           2
8        0           8
9        1           1
10       1           0
11       0           2

and the result would be

    result
0        7  
1        7  
2        5  
3        5  
4        5  
5        8  
6        8  
7        8  
8        8  
9        1  
10       1  
11       2

Solution

  • You can use :

    g = df["my_column"].ne(df["my_column"].shift()).cumsum()
    
    out = df.groupby(g)["my_column"].transform("count")
    

    Output :

    print(out)
    ​
        my_column
    0           2
    1           2
    2           3
    3           3
    4           3
    5           4
    6           4
    7           4
    8           4
    9           2
    10          2
    11          1
    

    NB : to get the max, use df.groupby(g)["other_value"].transform("max").