Search code examples
pythonregexlatex

How to use regular expression to remove all math expression in latex file


Suppose I have a string which consists of a part of latex file. How can I use python re module to remove any math expression in it?

e.g:

text="This is an example $$a \text{$a$}$$. How to remove it? Another random math expression $\mathbb{R}$..."

I would like my function to return ans="This is an example . How to remove it? Another random math expression ...".

Thank you!


Solution

  • Try this Regex:

    (\$+)(?:(?!\1)[\s\S])*\1
    

    Click for Demo

    Code

    Explanation:

    • (\$+) - matches 1+ occurrences of $ and captures it in Group 1
    • (?:(?!\1)[\s\S])* - matches 0+ occurrences of any character that does not start with what was captured in Group 1
    • \1 - matches the contents of Group 1 again

    Replace each match with a blank string.

    As suggested by @torek, we should not match 3 or more consecutive $, hence changing the expression to (\${1,2})(?:(?!\1)[\s\S])*\1