Search code examples
pythonpandasreplaceconditional-statementscell

Pandas - change cell value based on conditions from cell and from column


I have a Dataframe with a lot of "bad" cells. Let's say, they have all -99.99 as values, and I want to remove them (set them to NaN).

This works fine:

df[df == -99.99] = None

But actually I want to delete all these cells ONLY if another cell in the same row is market as 1 (e.g. in the column "Error").

I want to delete all -99.99 cells, but only if df["Error"] == 1.

The most straight-forward solution I thin is something like

df[(df == -99.99) & (df["Error"] == 1)] = None

but it gives me the error:

ValueError: cannot reindex from a duplicate axis

I tried every given solutions on the internet but I cant get it to work! :(

Since my Dataframe is big I don't want to iterate it (which of course, would work, but take a lot of time).

Any hint?


Solution

  • Try using broadcasting while passing numpy values:

    # sample data, special value is -99
    df = pd.DataFrame([[-99,-99,1], [2,-99,2], 
                       [1,1,1], [-99,0, 1]], 
                      columns=['a','b','Errors'])
    
    # note the double square brackets
    df[(df==-99) & (df[['Errors']]==1).values] = np.nan
    

    Output:

         a     b  Errors
    0  NaN   NaN       1
    1  2.0 -99.0       2
    2  1.0   1.0       1
    3  NaN   0.0       1