Search code examples
pythonpandasstatisticsrowsfrequency

create groupby based on label of a column in python


I have a large dataframe which is similar like this:

id        price             status
1           23               none
2           23               none
3           34               none
4           32               none
5           31               none
6           37               none
7           20               none
8           29               none
9           21               none
10          22               done

and I want to make a groupby based on the status. I want to make a group where the situation like this: every time the status is done, it will be one group.

So far, what I have done is making a group based on the index:

grouper = df.groupby(df.index // 10)

but then I realize that the status is written done randomly and not always every 10 rows.

How can I make it in python? Thankyou


Solution

  • Compare done values and cretae groups by cumulative sum from back by iloc[::-1], last add another iloc[::-1] for original order of column:

    g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
    grouper = df.groupby(g, sort=False)
    

    Sample:

    #chnaged data for more groups   
    print (df)
       id  price status
    0   1     23   none
    1   2     23   done
    2   3     34   none
    3   4     32   none
    4   5     31   done
    5   6     37   none
    6   7     20   none
    7   8     29   none
    8   9     21   none
    9  10     22   done
    
    g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
    print (g)
    0    3
    1    3
    2    2
    3    2
    4    2
    5    1
    6    1
    7    1
    8    1
    9    1
    Name: status, dtype: int32
    

    grouper = df.groupby(g, sort=False)
    
    for name, df in grouper:
        print (df)
    
       id  price status
    0   1     23   none
    1   2     23   done
       id  price status
    2   3     34   none
    3   4     32   none
    4   5     31   done
       id  price status
    5   6     37   none
    6   7     20   none
    7   8     29   none
    8   9     21   none
    9  10     22   done