This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
'b': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2],
'c': [9, 8, 11, 13, 14, 3, 104, 106, 11, 100, 70, 7]
}
)
Expected output: Creating column out
:
a b c out
0 x 1 9 NaN
1 x 1 8 NaN
2 x 1 11 NaN
3 x 2 13 found
4 x 2 14 NaN
5 y 1 3 NaN
6 y 1 104 found
7 y 1 106 NaN
8 y 2 11 NaN
9 y 2 100 NaN
10 y 2 70 NaN
11 y 2 7 NaN
The mask is:
mask = (df.c > 10)
The process: Grouping is by column a
:
a) For each group, finding the first row that meets the conditions of the mask
.
b) For group x
this condition only applies when b == 2
. That is why row 3
is selected.
And this is my attempt. It is getting close but it feels like this is not the way:
def func(g):
mask = (g.c > 10)
g.loc[mask.cumsum().eq(1) & mask, 'out'] = 'found'
return g
df = df.groupby('a').apply(func)
One option with groupby.idxmax
:
mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))
idx = mask.groupby(df['a']).idxmax()
df.loc[idx[mask.loc[idx].values], 'out'] = 'found'
Another with groupby.transform
:
mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))
df.loc[mask & mask.groupby(df['a'])
.transform(lambda m: (~m).shift(fill_value=True)
.cummin()),
'out'] = 'found'
Output, with an extra group z
that has no match:
a b c out
0 x 1 9 NaN
1 x 1 8 NaN
2 x 1 11 NaN
3 x 2 13 found
4 x 2 14 NaN
5 y 1 3 NaN
6 y 1 104 found
7 y 1 106 NaN
8 y 2 11 NaN
9 y 2 100 NaN
10 y 2 70 NaN
11 y 2 7 NaN
12 z 3 1 NaN
13 z 3 1 NaN
to get the last match (instead of the first one), just inver the mask:
Example:
mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))
mask = mask[::-1]
idx = mask.groupby(df['a']).idxmax()
df.loc[idx[mask.loc[idx].values], 'out'] = 'found'
a b c out
0 x 1 9 NaN
1 x 1 8 NaN
2 x 1 11 NaN
3 x 2 13 NaN
4 x 2 14 found
5 y 1 3 NaN
6 y 1 104 NaN
7 y 1 106 NaN
8 y 2 11 NaN
9 y 2 100 NaN
10 y 2 70 found
11 y 2 7 NaN
12 z 3 1 NaN
13 z 3 1 NaN