I have a large dataframe which is similar like this:
id price status
1 23 none
2 23 none
3 34 none
4 32 none
5 31 none
6 37 none
7 20 none
8 29 none
9 21 none
10 22 done
and I want to make a groupby based on the status. I want to make a group where the situation like this: every time the status is done, it will be one group.
So far, what I have done is making a group based on the index:
grouper = df.groupby(df.index // 10)
but then I realize that the status is written done randomly and not always every 10 rows.
How can I make it in python? Thankyou
Compare done
values and cretae groups by cumulative sum from back by iloc[::-1]
, last add another iloc[::-1]
for original order of column:
g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
grouper = df.groupby(g, sort=False)
Sample:
#chnaged data for more groups
print (df)
id price status
0 1 23 none
1 2 23 done
2 3 34 none
3 4 32 none
4 5 31 done
5 6 37 none
6 7 20 none
7 8 29 none
8 9 21 none
9 10 22 done
g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
print (g)
0 3
1 3
2 2
3 2
4 2
5 1
6 1
7 1
8 1
9 1
Name: status, dtype: int32
grouper = df.groupby(g, sort=False)
for name, df in grouper:
print (df)
id price status
0 1 23 none
1 2 23 done
id price status
2 3 34 none
3 4 32 none
4 5 31 done
id price status
5 6 37 none
6 7 20 none
7 8 29 none
8 9 21 none
9 10 22 done