In column 2
of df
, there are three possible values: X
, Y
, Z
. I want to group rows by the value X
along with any trailing Y
values in the columns directly following X
. I am not interested in preserving the Z
values in the groups.
I have tried using groupby()
like this: df.groupby(df[2] == 'X')
, however this obviously only grabs the X
values.
How could I go about creating the groupings that I am after?
df = pd.DataFrame({'1':['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'],
'2':['Z','X','Y','Z','Z','X','X','Z','X','Y','Y','Z','X','Z','X','Y']})
Desired groupings:
1 b X
2 c Y
---------
5 f X
---------
6 g X
---------
8 i X
9 j Y
10 k Y
---------
12 m X
---------
14 o X
15 p Y
You can use:
>>> list(df.groupby(df['2'].eq('X').cumsum().loc[df['2'] != 'Z']))
[(1.0,
1 2
1 b X
2 c Y),
(2.0,
1 2
5 f X),
(3.0,
1 2
6 g X),
(4.0,
1 2
8 i X
9 j Y
10 k Y),
(5.0,
1 2
12 m X),
(6.0,
1 2
14 o X
15 p Y)]
Details:
# Same as grp = df['2'].eq('X').cumsum().where(df['2'] != 'Z')
grp = df['2'].eq('X').cumsum().loc[df['2'] != 'Z']
pd.concat([df, grp.rename('G')], axis=1)
# Output
1 2 G
0 a Z NaN
1 b X 1.0
2 c Y 1.0
3 d Z NaN
4 e Z NaN
5 f X 2.0
6 g X 3.0
7 h Z NaN
8 i X 4.0
9 j Y 4.0
10 k Y 4.0
11 l Z NaN
12 m X 5.0
13 n Z NaN
14 o X 6.0
15 p Y 6.0