This is my dataframe:
df = pd.DataFrame({'a': [20, 21, 333, 444], 'b': [20, 20, 20, 20]})
I want to create column c
by using this mask:
mask = (df.a >= df.b)
And I want to get the last row that meets this condition and create column c
. The output that I want looks like this:
a b c
0 20 20 NaN
1 21 20 NaN
2 333 20 NaN
3 444 20 x
I tried the code below but it didn't work:
df.loc[mask.cumsum().gt(1) & mask, 'c'] = 'x'
For a mask to flag the last value satisfying a condition, use duplicated()
by keeping last. We know that mask
consists of at most 2 values (True/False). If we can create another mask that flags the last occurrences these values as True, then we can chain it with mask
itself for the desired mask. This is accomplished by ~mask.duplicated(keep='last')
because mask.duplicated(keep='last')
flags duplicates as True except for the last occurrence, so its negation gives us what we want.
df = pd.DataFrame({'a': [20, 21, 333, 444], 'b': [20, 20, 20, 20]})
mask = (df.a >= df.b)
df['c'] = pd.Series('x', df.index).where(mask & ~mask.duplicated(keep='last'))
If you want to slice/assign, then you can use this chained mask as well.
df.loc[mask & ~mask.duplicated(keep='last'), 'c'] = 'x'
A shorter version of @mandy8055's answer is to call idxmax()
to get the index of the highest cum sum (although this is showing a FutureWarning on pandas 2.1.0). As pointed out by @mozway, this works as long as there's at least one True value in mask
.
df.loc[mask.cumsum().idxmax(), 'c'] = 'x'