Search code examples
pythonparsingevalsympy

Parsing a product into monomials in python fast?


I have a monomial of the form m and I want to parse it into an expression having exponents.

m = "n0*n0"     
Sympy(n0)
parse_expr(n0*n0)

This gives me the following correct output:

n0**2

The problem is I have millions of such monomials with thousands of parameters in a dataframe column and right now I do an apply on them to parse them, this takes forever. Is there a better solution ? eval requires me to give a value to n0 I dont want to evaluate I just want to arse monomial like this into a succinct representation.


Solution

  • In an isympy session:

    In [38]: parse_expr("n0*n0")
    Out[38]: 
      2
    n₀ 
    
    In [39]: type(_)
    Out[39]: sympy.core.power.Pow
    

    The parsing takes a string, an returns a sympy expression.

    That is not a particularly fast operation:

    In [40]: timeit parse_expr("n0*n0")
    524 µs ± 79.5 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    Compare that with doing a simple string replace:

    In [42]: "n0*n0".replace("*n0","**2")
    Out[42]: 'n0**2'
    
    In [43]: timeit "n0*n0".replace("*n0","**2")
    118 ns ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    

    This is the first time I've seen someone trying to use sympy with pandas. People do try to use it with numpy, often with confusing results. Usually sympy.lambdify is only reliable way of creating a numpy function from a sympy expression. sympy is good for doing algebraic math, but I can't picture it being useful with large dataframes.