Search code examples
pythonpandasdataframedrop

Different ways to conditional Drop Row in Pandas


I have a DataFrame that has a column (AE) that could contain: nothing (""), "X", "A" or "E".

I want to drop all the rows that have the value "X" on it.

I searched nad I have found 2 ways of doing it:

df= df.drop(df[df.AE == "X"].index)

or

df=df[df["AE"] != "X"]

But for some reason, the first way of doing it drops more lines than it should.

Do the two lines of code do the same thing?

There seems to be a mistake I'm making when trying to do this "drop" using the first approach.


Solution

  • They are not the same.

    df = df.drop(df[df.AE == "X"].index)
    

    Is dropping rows by their index value, if the indexes are not unique, then the index of the rows where df['AE'] == "X" might be shared across other cases.

    df = df[df["AE"] != "X"]
    

    Here we are slicing the dataframe and keeping all rows where df["AE"] is different from "X". There is no consideration regarding the index value and actually are not dropping rows, but actually keeping those that meet a criteria.