Search code examples
htmlregexreplacelatexmathjax

regex to replace '\\' by <br> but only outside some tags


I am new to regex and have been struggling for a while on this one: I want to transform LaTeX files to HTML.

I use mathjax to render equations and some javascript replace functions to convert the tags. I have nearly finished, but I still have an issue with the line breaks: I need to transform \\ to <br>, but only outside the tags \begin{array} and \end{array}.

Example: in this portion, only the \\ before Montrer l'equivalence should be replaced.

$M=\left(
\begin{array}{c|c}
A &B \\ \hline
C &D \\ 
\end{array}
\right)$

$in$ $\mathcal{M}_{n}(\mathbb{K})$ avec $A$ $\in$ $\mathcal{M}_{r}(\mathbb{K})$ inversible.\\ Montrer l'equivalence:
\[
\Bigl( rg(A) = rg(M)  \Bigr) \Leftrightarrow \Bigl( D = CA^{-1}B \Bigr)
\]


\begin{enumerate} 
\item Calculer $detB$ en fontion de $A$. 
\item En déduire que $detB \geqslant 0$.
\end{enumerate}

$M=
\left(
\begin{array}{c|c}
A &B \\ \hline
C &D \\ 
\end{array}
\right)$

How can I do this with regex ?

EDIT: I have found here a handy regex tester...


Solution

  • You can use this pattern in a replace with a callback function that return the first capture group or <br> when it is void:

    /(\\begin{array}(?:[^\\]+|\\(?!end{array}))*\\end{array})|\\\\/
    

    The idea is to match \begin{array}...\end{array} before \\ to avoid to find \\ inside \begin{array}...\end{array}.

    detail:

    (?:                   # open a non-capturing group
        [^\\]+            # all characters but \ 1 or more times
      |                   # OR
        \\(?!end{array})  # \ not followed by "end{array}"
    )*                    # close non-capturing group, zero or more times
    

    This structure is more efficient than a simple .*? that need many backtracks to succeed. It's a bit longer but more performant since it avoids lazy quantifiers.

    (ps: remove the delimiters / in regexpal)