Search code examples
pythonpandasdataframecsvconditional-statements

How to drop multiple conditions including NaN using loc


First of all I am importing a csv file data to pandas

Now I have the following data

Schooltyp Name ParentName ParentAddress
Public Tom John Nanostreet
Private Bill Sally NaN
Public Ron Tony Burystreet
Public Danny Nate NaN
Private Stewart Ben PringleStreet

I need to remove data
where Schooltyp = Public and ParentAddress is null.

I tried different solutions. This is the latest solution I have currently used which results in an error due to the particular condition I chose (data['ParentAddress'].isnull)

data = pd.read_csv("dataload.csv", sep = ';',  error_bad_lines=False )

indexnames = data[(data['Schooltype']=='Public') & (data['ParentAddress'].isnull)].indexes
data.drop(indexnames, inplace = True)

data.to_csv('finaldata.csv', sep=';', index=False)

Am I using the right approach, is there a better way in doing this?


Solution

  • To remove all rows where Schooltyp is "Public" and ParentAddress is null:

    should_be_removed = (df['Schooltyp'] == 'Public') & df['ParentAddress'].isna()
    
    df.loc[~ should_be_removed]
    

    Result:

      Schooltyp     Name ParentName  ParentAddress
    0    Public      Tom       John     Nanostreet
    1   Private     Bill      Sally            NaN
    2    Public      Ron       Tony     Burystreet
    4   Private  Stewart        Ben  PringleStreet
    

    Notes:

    .ne() is equivalent to !=, just less typing.

    There is also a method .eq() which is the same as ==.

    To invert a condition, you can put ~ before it.