Search code examples
pythonpython-re

Python re find start and end index of group match


Python's re match objects have .start() and .end() methods on the match object. I want to find the start and end index of a group match. How can I do this? Example:

>>> import re
>>> REGEX = re.compile(r'h(?P<num>[0-9]{3})p')
>>> test = "hello h889p something"
>>> match = REGEX.search(test)
>>> match.group('num')
'889'
>>> match.start()
6
>>> match.end()
11
>>> match.group('num').start()                  # just trying this. Didn't work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'start'
>>> REGEX.groupindex
mappingproxy({'num': 1})                        # this is the index of the group in the regex, not the index of the group match, so not what I'm looking for.

The expected output above is (7, 10)


Solution

  • You can provide Match.start (and Match.end) with a group name to get the start (end) position of a group:

    >>> import re
    >>> REGEX = re.compile(r'h(?P<num>[0-9]{3})p')
    >>> test = "hello h889p something"
    >>> match = REGEX.search(test)
    >>> match.start('num')
    7
    >>> match.end('num')
    10
    

    An advantage of this approach over using str.index as suggested in other answers is that you do not run into problems if the group string occurs multiple times.