df_example = pd.DataFrame({'name': ['a', 'a', 'a', 'b', 'b', 'b'],
'class': [1, 2, 2, 3, 2, 2],
'price': [3, 4, 2, 1, 6, 5]})
I want to filter each name
where the price
is larger than the smallest price
in a subset class==2
within name
grounp:
df_example.sort_values(['name', 'price'], inplace=True)
df_tem = df_example[df_example['class'] == 2].groupby('name').first()
Below is the pseudocode:
df_example.groupby('name').apply(lambda key, val: val['price'] > df_tem.loc[key]['price']).reset_index()
Is there any effective way to achieve something like filter dataframe based on a subset within groupby
result:
the smallest price with class=2
for each name group df_tem
:
class price
name
a 2 2
b 2 5
Therefore,
group a: price>2; group b: price>5
the output:
pd.DataFrame({'name': ['a', 'a', 'b'],
'class': [1, 2, 2],
'price': [3, 4, 6]})
Update:
actually i have an idea that create a new column called smallest, then filter
df_example by df_example['price'] > df_example['smallest '].
Do you know how to quickly create such column something like
df_example['smallest '] = df_example[df_example['class'] == 2].groupby('name')['price'].transform('first')
above way still have nan
Code
Use groupby
to aggregate the minimum and use map
to map the result to name column
for boolean indexing.
m = df_example[df_example['class'] == 2].groupby('name')['price'].min()
out = df_example[df_example['price'] > df_example['name'].map(m)]
out
name class price
0 a 1 3
1 a 2 4
4 b 2 6
update additional question
Additionally, if you want to include name for which class=2
does not exist, use the code below
m = df_example[df_example['class'] == 2].groupby('name')['price'].min()
cond1 = df_example['price'] > df_example['name'].map(m)
cond2 = ~df_example['name'].isin(m.index)
out = df_example[cond1 | cond2]
new example
df_example = pd.DataFrame({'name': ['a', 'a', 'a', 'b', 'b', 'b', 'c'],
'class': [1, 2, 2, 3, 2, 2, 3],
'price': [3, 4, 2, 1, 6, 5, 5]})
out:
name class price
0 a 1 3
1 a 2 4
4 b 2 6
6 c 3 5