Problem
In Latex, delimiters like (…)
, […]
, and {…}
can increase in size depending on the size of equation within by adding a \left
and \right
before the opening- and closing-delimiter, respectively; such as \left( <equation> \right)
.
However, this should come in pairs. This means that whenever you introduce \left
it should have a pair of \right
, otherwise this is error. But there are some that it only needs the opening or closing delimiter and this can be solved by adding \left.
(\left dot) or \right.
(\right dot) in-place for the missing pair, such as \left(<equation>\right
. or \left.<equation>\right)
Question: How can I automatically insert the missing pair?
Example Input:
\begin{align}
\left( content & content \right) content \left( content \left( content \right) \\
content \right) \left(content \left( content \\
content \right)
\end{align}
\begin{align}
\left( content & content \right) content \left( content \left( content \right) \nonumber \\
content \right) \left( content \left( content \nonumber \\
content \right)
\end{align}
Output should be:
\begin{align}
\left( content [\right.] & [\left.] content \right) content \left( content \left( content \right) [\right.] \\
[\left.] content \right) \left( content \left( content [\right.] [\right.] \\
[\left.] content \right)
\end{align}
\begin{align}
\left( content [\right.] & [\left.] content \right) content \left( content \left( content \right) [\right.] \nonumber \\
[\left.] content \right) \left( content \left( content [\right.] [\right.] \nonumber \\
[\left.] content \right)
\end{align}
The ones between the square-brackets should be automatically generated (without the square brackets).
If not paired,
\right.
before the end of &
, \
, or \end{}
whichever is applicable based on the above inclusions. The number of \right.
must be the number of \left
without a pair. If there is a \nonumber
before the end of the inclusion, add the \right.
before the \nonumber
tag.\left.
at the start after &
, \
or \begin{}
whichever is applicable based on the above inclusions. The number of \left.
must be the number of \right
without a pair.It might be better to approach this problem more generally with the aid of a proper LaTeX parser. However, if you're looking to tackle this specific problem in Python as you've stated it, below is some code that will do the job.
For the code to work out of the box, you'll only need to replace the contents of the snippet
variable with your string of interest.
The code assumes that you are trying to balance single- or multi-line equations within align
blocks, and that your snippet
is an uninterrupted (other than by whitespace) series of such blocks, as in the example. You should be okay with whitespace inside your equations being stripped and rearranged.
import re
snippet: str = r"""
\begin{align}
\left( content & content \right) content \left( content \left( content \right) \\
content \right) \left(content \left( content \\
content \right)
\end{align}
\begin{align}
\left( content & content \right) content \left( content \left( content \right) \nonumber \\
content \right) \left( content \left( content \nonumber \\
content \right)
\end{align}
"""
# regex to capture stuff within the align blocks
re_align = re.compile(r'\\begin\{align\}(.*?)\\end\{align\}', flags=re.DOTALL)
# left bracket patterns
re_parens_left = re.compile(r'\\left\(', flags=re.DOTALL)
re_braces_left = re.compile(r'\\left\\\{', flags=re.DOTALL)
re_square_left = re.compile(r'\\left\[', flags=re.DOTALL)
# right bracket patterns
re_parens_right = re.compile(r'\\right\)', flags=re.DOTALL)
re_braces_right = re.compile(r'\\right\\\}', flags=re.DOTALL)
re_square_right = re.compile(r'\\right\]', flags=re.DOTALL)
re_break = re.compile(r'[\s]*\\\\[\s]*', flags=re.DOTALL)
re_nonum = re.compile(r'\\nonumber', flags=re.DOTALL)
# function that does the balancing for a column string; invoked by main loop below
from collections import deque
def balance(string: str, re_left: re.Pattern, re_right: re.Pattern) -> str:
"""
for a given bracket type, identify all occurrences of the current bracket,
and balance them using the standard stack-based algorithm; Python collections'
'deque' data structure serves the purpose of a stack here.
"""
re_either = re.compile(re_left.pattern + '|' + re_right.pattern, flags=re.DOTALL)
match_list = deque(re_either.findall(string))
if len(match_list) == 0:
return string # early exit if no brackets => no balancing needed
balance_stack = deque()
for item in match_list:
if re_left.match(item): current_bracket = 'l'
elif re_right.match(item): current_bracket = 'r'
else: raise ValueError(f"got problematic bracket '{item}' in 'balance'")
previous_bracket = balance_stack[-1] if len(balance_stack) > 0 else None
if (previous_bracket == 'l') and (current_bracket == 'r'):
balance_stack.pop()
else:
balance_stack.append(current_bracket)
# whatever's left on the stack is the imbalance
remaining = ''.join(balance_stack)
imbalance_left = remaining.count('l')
imbalance_right = remaining.count('r')
balance_string_left = ' ' + ' '.join([r'\right.'] * imbalance_left) if imbalance_left > 0 else ''
balance_string_right = ' '.join([r'\left.'] * imbalance_right) + ' ' if imbalance_right > 0 else ''
nonum_match = False if re_nonum.search(string) is None else True
result = re_nonum.sub('', string)
nonum_string = ' \\nonumber ' if nonum_match else ''
result = balance_string_right + result + balance_string_left + nonum_string
return result
# main loop
result_equations = []
for equation in re_align.findall(snippet):
lines = re_break.split(equation.strip()) # split on double backslash
result_lines = []
for line in lines:
columns = line.strip().split('&')
result_columns = []
for column in columns:
# balance brackets using the stack algorithm
result_column = column.strip()
# for each type of bracket () or \{\} or [], return the balanced string
result_column = balance(result_column, re_parens_left, re_parens_right)
result_column = balance(result_column, re_braces_left, re_braces_right)
result_column = balance(result_column, re_square_left, re_square_right)
result_columns.append(result_column)
result_line = ' & '.join(result_columns)
result_lines.append(result_line)
result_equation = '\\begin{align}\n ' + ' \\\\\n '.join(result_lines) + '\n\\end{align}'
result_equations.append(result_equation)
result = '\n\n'.join(result_equations)
print(result)
How the code works
The code relies on Python's re
(regular expressions) library to identify patterns of interest. The first part of the code compiles the bracket and other patterns that we expect to work with.
Next comes the main loop -- the input string snippet
is broken down hierarchically here: first by align
equation block, then by line \\
within equation, and finally by column (delimited by &
) within line.
For each column, the code balances brackets using the standard stack-based algorithm; this is done in the balance
function, once for each type of bracket. An adjustment for the presence of \nonumber
is made.
The code then joins back the balanced columns, lines and equations to synthesize the final result.
Limitations
The code is a bit cumbersome but solves the problem as you've stated it, making reasonable simplifying assumptions whenever your specification has the potential to be problematic. Cases where this will fail (not exhaustive):
\begin{align}
\left( content & content \\ % comment: the wandering explorer turned \left(
content \textup{sneaked in a \\left( payload}
\end{align}
Identifying comments and highly nested syntax with strange edge cases isn't in the scope of this code. I'd recommend staying vigilant if you plan to use this for anything material.