Search code examples
pythonregexregex-greedy

Does this regex fail, or do I need to modify the regex to support "optional followed by"?


I am trying the following regex: https://regex101.com/r/5dlRZV/1/, I am aware, that I am trying with \author and not \maketitle

In python, I try the following:

import re

text = str(r'
\author{
\small 
}

\maketitle
')

regex = [re.compile(r'[\\]author*|[{]((?:[^{}]*|[{][^{}]*[}])*)[}]', re.M | re.S), 
re.compile(r'[\\]maketitle*|[{]((?:[^{}]*|[{][^{}]*[}])*)[}]', re.M | re.S)]

for p in regex: 
  for m in p.finditer(text): 
     print(m.group())

Python freezes, I am suspecting that this has something to do with my pattern, and the SRE fails.

EDIT: Is there something wrong with my regex? Can it be improved to actually work? Still I get the same results on my machine.

EDIT 2: Can this be fixed somehow so the pattern supports optional followed by ?: or ?= look-heads? So that one can capture both?


Solution

  • After reading the heading, "Parentheses Create Numbered Capturing Groups", on this site: https://www.regular-expressions.info/brackets.html, I managed to find the answer which is:

    Besides grouping part of a regular expression together, parentheses also create a 
    numbered capturing group. It stores the part of the string matched by the part of 
    the regular expression inside the parentheses.
    
    The regex Set(Value)? matches Set or SetValue. 
    In the first case, the first (and only) capturing group remains empty. 
    In the second case, the first capturing group matches Value.