Search code examples
regexperlmathjax

Perl Regex for mathjax syntax


I have a problem making a perl regex to change \ character following these rules:

  1. A matching sequence should start with \(
  2. It should end with \)
  3. Any \ character in the previous matching sequence should be replaced with a double backslash \\

Example text reference:

Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\) 
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .

Should become:

Se la \probabilità dell'evento\ A è \\(\\frac{3}{4} \\) e la
probabilità dell'evento B è \\(\\frac{1}{4}\\) 
\\(\\frac{3}{4} +\\frac{3}{4}\\) .
\\(\\frac{1}{4} - \\frac{3}{4}\\) .
\\(\\frac{3}{16}\\) .
\\(\\frac{1}{2}\\) .

So far this is my best bet:

s/(\\\()(.*)(\\)(.*)(\\\))/\\\\\($2\\\\$4\\\\\)/mg

which produces:

Se la \probabilità dell'evento\ A è \\(\\frac{3}{4} \\) e la
probabilità dell'evento B è \\(\\frac{1}{4}\\) 
\\(\frac{3}{4} +\\frac{3}{4}\\) .
\\(\frac{1}{4} - \\frac{3}{4}\\) .
\\(\\frac{3}{16}\\) .
\\(\\frac{1}{2}\\) .

As you can see

\\(\frac{3}{4} +\\frac{3}{4}\\) .
\\(\frac{1}{4} - \\frac{3}{4}\\) .

are wrong.

How can I modify my regex to accomodate my needs?


Solution

  • Posting an updated regex from my original.

    The original had a validation at the end for all escapes.
    After looking at it, it can be sped up by only doing the validation
    one time when it finds the opening block.

    At the bottom is a benchmark that compares the two methods.

    Updated regex:

    $str =~ s/(?s)(?:(?!\A)\G(?!\))[^\\]*\K\\|\\(?=\(.*?\\\)))/\\\\/g;

    Formatted and tested:

     (?s)               # Dot-All modifier
     (?:                # Cluster start
          (?! \A )           # Not beginning of string
          \G                 # G anchor - If matched before, start at end of last match
          (?! \) )           # Last was an escape, so ')' ends the block
          [^\\]*             # Many non-escape's
          \K                 # Previous is not part of match
          \\                 # A lone escape
       |                   # or,
                             # New Block Check - 
          \\                 # A lone escape then,
          (?=                # One time Validation:
               \(                 #  an opening '('
               .*?                #  anything
               \\ \)              #  then a final '\)'
          )                  # -------------
     )                  # Cluster end
    

    Benchmark:

    Sample \( \\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \)

    Results

    New Regex:   (?s)(?:(?!\A)\G(?!\))[^\\]*\K\\|\\(?=\(.*?\\\)))
    Options:  < none >
    Completed iterations:   50  /  50     ( x 1000 )
    Matches found per iteration:   31
    Elapsed Time:    1.25 s,   1253.92 ms,   1253924 µs
    
    
    Old Regex:   (?s)(?:(?!\A)\G[^\\]*\K\\|\\(?=\())(?=.*?(?<=\\)\))
    Options:  < none >
    Completed iterations:   50  /  50     ( x 1000 )
    Matches found per iteration:   31
    Elapsed Time:    3.95 s,   3952.31 ms,   3952307 µs