I would like to create a parser that takes in any LaTeX formatted string and returns an expression that Python can evaluate.
I am having a couple of issues with fractions. Here are some example:
LaTeX (input) | Interoperable String (output) |
---|---|
\frac{1}{2} |
((1)/(2)) |
\frac{x}{3b} |
((x)/(3b)) |
\frac{2-m}{3} |
((2-m)/(3)) |
\frac{7}{5+y} |
((7)/(5+y)) |
Here is what I have tried so far:
fraction_re = re.compile(r"\\frac{(.*?)}{(.*?)}")
def parser(expression):
fractions = fraction_re.findall(expression)
for numerator, denominator in fractions:
pattern = r"\\frac\{%s\}\{%s\}" % (numerator, denominator)
replace = f"(({numerator})/({denominator}))"
expression = re.sub(pattern=pattern, repl=replace, string=expression)
return expression
This works fine for cases one and two (see table) but is having problems with cases three and four. I suspect that the -
and the +
symbols are causing issues as they themselves are regex metacharacters.
I thought of adding some extra lines to escape them, e.g.
numerator = re.sub(pattern='+', repl='\+', string=numerator)
But this doesn't strike me as a good long term strategy. I have also tried adding square brackets to the pattern
variable (as normal regex symbols in square brackets are not interpreted as such), i.e.
pattern = r"\\frac\{[%s]\}\{[%s]\}" % (numerator, denominator)
But this didn't work either.
What can I try next?
I know that this has been asked many times on SO before (e.g. Python Regex to Simplify LaTex Fractions Using Python Regex to Simplify Latex Fractions Using if-then-else conditionals with Python regex replacement) but I feel like their questions are a little different to mine and I have not been able to find an answer that helps me much.
Also I know that there already exist out-of-the-box parsers that do exactly what I'd want (for example: https://github.com/augustt198/latex2sympy) but I really would like to build this myself.
I'm not sure why you're taking a two-stage approach; as you have noted it is causing you problems with regex meta characters in the second stage. You could just make the substitution as you match using re.sub
:
import re
fraction_re = re.compile(r'\\frac{([^}]+)}{([^}]+)}')
def parser(expression):
return fraction_re.sub(r'((\1)/(\2))', expression)
print(parser(r'\frac{1}{2} \frac{x}{3b} \frac{2-m}{3} \frac{7}{5+y}'))
Output
((1)/(2)) ((x)/(3b)) ((2-m)/(3)) ((7)/(5+y))
Note that it's more efficient to use [^}]+
than .*?
in your regex as it will reduce backtracking.