I'm using https://github.com/mrabarnett/mrab-regex (via pip install regex
, but experiencing a failure here:
pattern_string = r'''
(?&N)
^ \W*? ENTRY \W* (?P<entries> (?&Range) ) (?&N)
(?(DEFINE)
(?P<Decimal>
[ ]*? \d+ (?:[.,] \d+)? [ ]*?
)
(?P<Range>
(?&Decimal) - (?&Decimal) | (?&Decimal)
#(?&d) (?: - (?&d))?
)
(?P<N>
[\s\S]*?
)
)
'''
flags = regex.MULTILINE | regex.VERBOSE #| regex.DOTALL | regex.V1 #| regex.IGNORECASE | regex.UNICODE
pattern = regex.compile(pattern_string, flags=flags)
bk2 = f'''
ENTRY: 0.0975 - 0.101
'''.strip()
match = pattern.match('ENTRY: 0.0975 - 0.101')
match.groupdict()
gives:
{'entries': '0.0975', 'Decimal': None, 'Range': None, 'N': None}
It misses the second value.
> pip show regex
Name: regex
Version: 2022.1.18
Summary: Alternative regular expression module, to replace re.
Home-page: https://github.com/mrabarnett/mrab-regex
Author: Matthew Barnett
Author-email: [email protected]
License: Apache Software License
Location: ...
Requires:
Required-by:
> python --version
Python 3.10.0
The problem is that the spaces you defined in the Decimal
group pattern are consumed, and the DEFINE
patterns are atomic, so although the last [ ]*?
part is lazy and can match zero times, once it matches, there is no going back. You can check this if you put the Decimal
pattern into an atomic group and compare two patterns, cf. this regex demo and this regex demo. (?mx)^\W*?ENTRY\W*(?P<entries>(?>[ ]*? \d+ (?:[.,] \d+)? [ ]*?) - (?>[ ]*? \d+ (?:[.,] \d+)? [ ]*?) | (?>[ ]*? \d+ (?:[.,] \d+)? [ ]*?))
exposes the same behavior as your regex with DEFINE
block, while (?mx)^\W*?ENTRY\W*(?P<entries>[ ]*? \d+ (?:[.,] \d+)? [ ]*? - [ ]*? \d+ (?:[.,] \d+)? [ ]*? | [ ]*? \d+ (?:[.,] \d+)? [ ]*?)
finds the match correctly.
The easiest fix is to move the optional space patterns into the Range
group pattern.
There are other minor enhancements you might want to introduce here:
regex.match
with the N
group pattern ([\s\S]*?
), you may use regex.search
and remove the N
pattern from the regexa|a-b
like patterns, you can use a more efficient optional non-capturing group approach, a(?:-b)?
.So, the regex can look like
^ \W* ENTRY \W* (?P<entries> (?&Range) )
(?(DEFINE)
(?P<Decimal>
\d+ (?:[.,] \d+)?
)
(?P<Range>
(?&Decimal)(?:\ *-\ *(?&Decimal))*
)
)
See the regex demo.
See the Python demo:
import regex
pattern_string = r'''
^ \W* ENTRY \W* (?P<entries> (?&Range) )
(?(DEFINE)
(?P<Decimal>
\d+ (?:[.,] \d+)?
)
(?P<Range>
(?&Decimal)(?:\ *-\ *(?&Decimal))?
)
)
'''
flags = regex.MULTILINE | regex.VERBOSE
pattern = regex.compile(pattern_string, flags=flags)
bk2 = f'''
ENTRY: 0.0975 - 0.101
'''.strip()
match = pattern.search('ENTRY: 0.0975 - 0.101')
print(match.groupdict())
Output:
{'entries': '0.0975 - 0.101', 'Decimal': None, 'Range': None}