Search code examples
pythonpandaseval

How to evaluate an expression in a pandas df that finds the greater of two variables?


I want to do compute a value that is the maximum of two columns, as follows:

df = pd.DataFrame( {'a':[1,3,5],
                    'b':[6,4,2] } )

df['c'] = df.eval('maximum(a,b)')

I am getting 'ValueError: "maximum" is not a supported function'. This is true even when I use engine='python'. Surprising because maximum is a ufunc. The exact computation required must be provided as an external string. How should I proceed?

eval('maximum(df.a, df.b)') 

does work fine, but I'd rather not do this for readability reasons.


Solution

  • One possibility would be to use numpy.maximum as an external function:

    from numpy import maximum
    
    df = pd.DataFrame( {'a':[1,3,5],
                        'b':[6,4,2] } )
    
    df['c'] = df.eval('@maximum(a,b)')
    

    Output:

       a  b  c
    0  1  6  6
    1  3  4  4
    2  5  2  5