Search code examples
pythonpandasdataframemultiple-columnsdrop

multiple condition row dropping


I would like to select a subset of my dataframe which satisfies the following condition: I have a dataframe that shows the result of different tests of three students. As soon as one of the students gets the result "poor", they can't be considered for the experiment and need to be dropped from the dataset. My Dataframe looks like this:

import pandas as pd

data = {'Name':  ['Peter', 'Peter','Anna', 'Anna','Anna', 'Max'],
        'Result': ["Good", "Good", "Good", "Good", "poor", "Very Good"],
         }

df = pd.DataFrame (data, columns = ['Name','Points'])

This means that I would first need to look who has done poor to then delete every row with that Person in it. My desired outcome in this example would be:

df_res = pd.DataFrame({'Name': ('Peter', 'Peter', 'Max', 'Max'), 
                   'Result': ("Good", "Good", "Very Good")}) 

Can anyone help me here? Especially deleting all the rows with the corresponding names in it is an obstacle for me.


Solution

  • Find Names of items that have a 'poor' Result, then use that to filter in records whose Names aren't in that list.

    >>> df = pd.DataFrame(data) # leave out the columns parameter.
    >>>
    >>> df[~df.Name.isin(df[df.Result == 'poor'].Name.values)]
        Name     Result
    0  Peter       Good
    1  Peter       Good
    5    Max  Very Good
    

    "Boolean masking" I think we call it.

    Aren't we being a bit unfair to Anna - she has more good results than all the rest. So what - she had a bad day...

    =) anyway...

    You can also specifically use the .drop() method too:

    >>> df.drop(index=df[df.Name.isin(df[df.Result == 'poor'].Name)].index)
        Name     Result
    0  Peter       Good
    1  Peter       Good
    5    Max  Very Good