Search code examples
pythonstringindexingslicepython-re

Why does re.Match object return an end index higher than expected?


I'm trying to understand regular expression operations, string slicing, and strings in Python.

String slicing using .start() and .end() results in the expected substring, and if I go for a single character from a string using .start() the resulting character is as expected, but if I go for a single character from a string using the .end() index, it doesn't result in the expected character.

I understand that lists (including strings) begin with element zero, but why are exceptions to this rule the stop index of a string slice and the re Match .end() index?

>>> import re
>>> m = re.search("bake","123bake456")
>>> m
<re.Match object; span=(3, 7), match='bake'>
>>> m.span()
(3, 7)
>>> m.start()
3
>>> m.end()
7
>>> "123bake456"[m.start():m.end()]
'bake'
>>> "123bake456"[m.start()]
'b'
>>> "123bake456"[m.end()]
'4'

Solution

  • The slice goes up to the ending index but does not include it. i.e. a span of (3, 7) includes index 6, but not 7. Similar to how range(1,100) would go 1-99 but not include 100.