Search code examples
pythonregexregex-group

extract a prefix and multiple subsequent matches


My Problem

I have a line that contains a prefix followed by one or more matched patterns. For example, the prefix is a letter followed by one or more numbers separated by spaces:

s='A 3 4 5'

I would like to find a regex pattern that would extract both the prefix and the repeated patterns.

What Have I Tried

s='''A 3 4 5'''
reg = re.compile(r'''
    ^(\w)       # Prefix
    (
        \s*     # Space separator
        (\d+)   # Pattern
        \s*     # Space separator
    )*
''', re.VERBOSE)
print(reg.findall(s))

However, it only finds the prefix and a single match:

[('A', '5', '5')]

The matched pattern appears twice because I have two groups - one containing the pattern itself and one containing the pattern with its separators.

My Question

How can I retrieve a single prefix and multiple matched patterns separated by a given divider using a Python regex?


Solution

  • This will require a two-level regex. Here's an example way to do it:

    >>> import re
    >>> s='''A 3 4 5'''
    >>> outer_match = re.match(r'^(?P<prefix>\w)(?P<suffix>(\s*\d+\s*)*)', s)
    >>> outer_match.groupdict()
    {'prefix': 'A', 'suffix': ' 3 4 5'}
    

    Then to extract the suffix pieces:

    >>> prefix = outer_match.group('prefix')
    >>> suffixes = re.findall(r'\s*(?P<val>\d+)\s*', outer_match.group('suffix'))
    >>> suffixes
    ['3', '4', '5']