Search code examples
pythonpython-3.xpandasdataframetext-parsing

How to parse and evaluate a math expression with Pandas Dataframe columns?


What I would like to do is to parse an expression such this one:

result = A + B + sqrt(B + 4)

Where A and B are columns of a dataframe. So I would have to parse the expresion like this in order to get the result:

new_col = df.B + 4
result = df.A + df.B + new_col.apply(sqrt)

Where df is the dataframe.

I have tried with re.sub but it would be good only to replace the column variables (not the functions) like this:

import re

def repl(match):
    inner_word = match.group(1)
    new_var = "df['{}']".format(inner_word)
    return new_var

eq = 'A + 3 / B'
new_eq = re.sub('([a-zA-Z_]+)', repl, eq)
result = eval(new_eq)

So, my questions are:

  • Is there a python library to do this? If not, how can I achieve this in a simple way?
  • Creating a recursive function could be the solution?
  • If I use the "reverse polish notation" could simplify the parsing?
  • Would I have to use the ast module?

Solution

  • Pandas DataFrames do have an eval function. Using your example equation:

    import pandas as pd
    # create an example DataFrame to work with
    df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
    # define equation
    eq = 'A + 3 / B'
    # actual computation
    df.eval(eq)
    
    # more complicated equation
    eq = "A + B + sqrt(B + 4)"
    df.eval(eq)
    

    Warning

    Keep in mind that eval allows to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.