I have a DataFrame that has a column (AE) that could contain: nothing (""), "X", "A" or "E".
I want to drop all the rows that have the value "X" on it.
I searched nad I have found 2 ways of doing it:
df= df.drop(df[df.AE == "X"].index)
or
df=df[df["AE"] != "X"]
But for some reason, the first way of doing it drops more lines than it should.
Do the two lines of code do the same thing?
There seems to be a mistake I'm making when trying to do this "drop" using the first approach.
They are not the same.
df = df.drop(df[df.AE == "X"].index)
Is dropping rows by their index value, if the indexes are not unique, then the index of the rows where df['AE'] == "X" might be shared across other cases.
df = df[df["AE"] != "X"]
Here we are slicing the dataframe and keeping all rows where df["AE"] is different from "X". There is no consideration regarding the index value and actually are not dropping rows, but actually keeping those that meet a criteria.