Search code examples
pythondataframenumpyambiguous

Python: error when using max() in Numpy where method for defining a new column in pandas dataframe


I got an error when I used the default python max function in the Numpy where method. The goal is to obtain a new column based on the condition defined in the where method.

I used the following function:

def function (df):

  df["new col"]= np.where(df["col 1"]> 10, max(df["col 1"]-df["col 2"],0),0)

  return df

The error I got is as follows:

the truth value of a series is ambiguous. Use a a.empty(), a.bool(), a.item(), a.any() or a.all().

However, by eliminating the 0 in the max() the code would properly work. I need to to use the zero in the max function to avoid negative values.

 df["new col"]= np.where(df["col 1"]> 10, max(df["col 1"]-df["col 2"]),0)


Solution

  • What causes the error is not the np.where function, but the max. In order to avoid this error, you can replace python's built-in max with numpy's np.max, or with np.maximum, depending on what you're trying to achieve

    Example:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame ({"col 1":[1,20,3,40],"col 2":[10,2,30,4]})
    

    Using np.max:

    df["new col"]= np.where(df["col 1"]> 10, np.maximum(df["col 1"]-df["col 2"],0),0)
    

    Output:

       col 1  col 2  new col
    0      1     10        0
    1     20      2       18
    2      3     30        0
    3     40      4       36
    

    Here the positions where col 1 > 10 receive value the max of col1 - col2 for that same position and 0 if this value is negative. The rest of the positions receive value 0.

    Using np.maximum:

    df["new col"]= np.where(df["col 1"]> 10, np.max(df["col 1"]-df["col 2"],0),0)
    

    Output:

       col 1  col 2  new col
    0      1     10        0
    1     20      2       36
    2      3     30        0
    3     40      4       36
    

    Here the positions where col 1 > 10 receive the max value of col1 - col2, while the other positions receive 0.