I have a line that contains a prefix followed by one or more matched patterns. For example, the prefix is a letter followed by one or more numbers separated by spaces:
s='A 3 4 5'
I would like to find a regex pattern that would extract both the prefix and the repeated patterns.
s='''A 3 4 5'''
reg = re.compile(r'''
^(\w) # Prefix
(
\s* # Space separator
(\d+) # Pattern
\s* # Space separator
)*
''', re.VERBOSE)
print(reg.findall(s))
However, it only finds the prefix and a single match:
[('A', '5', '5')]
The matched pattern appears twice because I have two groups - one containing the pattern itself and one containing the pattern with its separators.
How can I retrieve a single prefix and multiple matched patterns separated by a given divider using a Python regex?
This will require a two-level regex. Here's an example way to do it:
>>> import re
>>> s='''A 3 4 5'''
>>> outer_match = re.match(r'^(?P<prefix>\w)(?P<suffix>(\s*\d+\s*)*)', s)
>>> outer_match.groupdict()
{'prefix': 'A', 'suffix': ' 3 4 5'}
Then to extract the suffix pieces:
>>> prefix = outer_match.group('prefix')
>>> suffixes = re.findall(r'\s*(?P<val>\d+)\s*', outer_match.group('suffix'))
>>> suffixes
['3', '4', '5']