Search code examples
pythonregexlatex

Regex: How to convert LaTeX fractions into an operable expression in Python?


I would like to create a parser that takes in any LaTeX formatted string and returns an expression that Python can evaluate.

I am having a couple of issues with fractions. Here are some example:

LaTeX (input) Interoperable String (output)
\frac{1}{2} ((1)/(2))
\frac{x}{3b} ((x)/(3b))
\frac{2-m}{3} ((2-m)/(3))
\frac{7}{5+y} ((7)/(5+y))

Here is what I have tried so far:

fraction_re = re.compile(r"\\frac{(.*?)}{(.*?)}")

def parser(expression):

    fractions = fraction_re.findall(expression)

    for numerator, denominator in fractions:
        pattern = r"\\frac\{%s\}\{%s\}" % (numerator, denominator)
        replace = f"(({numerator})/({denominator}))"
        expression = re.sub(pattern=pattern, repl=replace, string=expression)

    return expression

This works fine for cases one and two (see table) but is having problems with cases three and four. I suspect that the - and the + symbols are causing issues as they themselves are regex metacharacters.

I thought of adding some extra lines to escape them, e.g.

numerator = re.sub(pattern='+', repl='\+', string=numerator)

But this doesn't strike me as a good long term strategy. I have also tried adding square brackets to the pattern variable (as normal regex symbols in square brackets are not interpreted as such), i.e.

pattern = r"\\frac\{[%s]\}\{[%s]\}" % (numerator, denominator)

But this didn't work either.

What can I try next?

Post Script

I know that this has been asked many times on SO before (e.g. Python Regex to Simplify LaTex Fractions Using Python Regex to Simplify Latex Fractions Using if-then-else conditionals with Python regex replacement) but I feel like their questions are a little different to mine and I have not been able to find an answer that helps me much.

Also I know that there already exist out-of-the-box parsers that do exactly what I'd want (for example: https://github.com/augustt198/latex2sympy) but I really would like to build this myself.


Solution

  • I'm not sure why you're taking a two-stage approach; as you have noted it is causing you problems with regex meta characters in the second stage. You could just make the substitution as you match using re.sub:

    import re
    
    fraction_re = re.compile(r'\\frac{([^}]+)}{([^}]+)}')
    
    def parser(expression):
        return fraction_re.sub(r'((\1)/(\2))', expression)
    
    print(parser(r'\frac{1}{2}  \frac{x}{3b}   \frac{2-m}{3}   \frac{7}{5+y}'))
    

    Output

    ((1)/(2))  ((x)/(3b))   ((2-m)/(3))   ((7)/(5+y))
    

    Note that it's more efficient to use [^}]+ than .*? in your regex as it will reduce backtracking.