I would like to select a subset of my dataframe which satisfies the following condition: I have a dataframe that shows the result of different tests of three students. As soon as one of the students gets the result "poor", they can't be considered for the experiment and need to be dropped from the dataset. My Dataframe looks like this:
import pandas as pd
data = {'Name': ['Peter', 'Peter','Anna', 'Anna','Anna', 'Max'],
'Result': ["Good", "Good", "Good", "Good", "poor", "Very Good"],
}
df = pd.DataFrame (data, columns = ['Name','Points'])
This means that I would first need to look who has done poor to then delete every row with that Person in it. My desired outcome in this example would be:
df_res = pd.DataFrame({'Name': ('Peter', 'Peter', 'Max', 'Max'),
'Result': ("Good", "Good", "Very Good")})
Can anyone help me here? Especially deleting all the rows with the corresponding names in it is an obstacle for me.
Find Name
s of items that have a 'poor'
Result
, then use that to filter in records whose Name
s aren't in that list.
>>> df = pd.DataFrame(data) # leave out the columns parameter.
>>>
>>> df[~df.Name.isin(df[df.Result == 'poor'].Name.values)]
Name Result
0 Peter Good
1 Peter Good
5 Max Very Good
"Boolean masking" I think we call it.
Aren't we being a bit unfair to Anna - she has more good results than all the rest. So what - she had a bad day...
=) anyway...
You can also specifically use the .drop()
method too:
>>> df.drop(index=df[df.Name.isin(df[df.Result == 'poor'].Name)].index)
Name Result
0 Peter Good
1 Peter Good
5 Max Very Good