Search code examples
pythonstring-matching

Python to detect latex mathematics using regular expressions or other methods


I want to detect if a long text string (input from "somewhere") contains mathematical expressions encoded in LaTeX. This means searching for substrings (denoted ... in what follows) enclosed inside either of:

  1. $...$
  2. \[...\]
  3. \(...\)
  4. \begin{displaymath} ... \end{displaymath}

There are some variations of item 3 with other keywords than displaymath, and there may be a whitespace inside the brace, etc., but I suppose I can figure out the rest once I get (1), (2), (3) working.

For (1), I suppose I can do the following:

import re
if re.search(r"$(\w+)$", str):
  (do something)`

But I am having problems with the others, especially when it has the \. Help would be appreciated.

The python version should be 2.7.12 but ideally code that works for both versions 2.x and 3.x will be preferred.


Solution

  • You need to escape \,[,],{,},(,) as they have special meaning in regular expression.

    So, you need to add an extra \ before them, when you want to match them literally.

    For your second pattern, use:

    \\\[(.+?)\\\]
    

    For third pattern, use:

    \\\((.+?)\\\)
    

    For fourth pattern,

    \\begin\{displaymath\}(.+?)\\end\{displaymath\}
    

    You can see the demo for the fourth pattern here.