I have a Dataframe with a lot of "bad" cells. Let's say, they have all -99.99 as values, and I want to remove them (set them to NaN).
This works fine:
df[df == -99.99] = None
But actually I want to delete all these cells ONLY if another cell in the same row is market as 1 (e.g. in the column "Error").
I want to delete all -99.99 cells, but only if df["Error"] == 1.
The most straight-forward solution I thin is something like
df[(df == -99.99) & (df["Error"] == 1)] = None
but it gives me the error:
ValueError: cannot reindex from a duplicate axis
I tried every given solutions on the internet but I cant get it to work! :(
Since my Dataframe is big I don't want to iterate it (which of course, would work, but take a lot of time).
Any hint?
Try using broadcasting while passing numpy values:
# sample data, special value is -99
df = pd.DataFrame([[-99,-99,1], [2,-99,2],
[1,1,1], [-99,0, 1]],
columns=['a','b','Errors'])
# note the double square brackets
df[(df==-99) & (df[['Errors']]==1).values] = np.nan
Output:
a b Errors
0 NaN NaN 1
1 2.0 -99.0 2
2 1.0 1.0 1
3 NaN 0.0 1