Search code examples
pythonregexreplacepython-re

Confusing example from the Python re module


From the Python docs:

re.sub(pattern, repl, string, count=0, flags=0)

...

The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so sub('x*', '-', 'abxd') returns '-a-b--d-'.

So x* should match

  1. The empty string before a
  2. The empty string between a and b
  3. The empty string between b and x
  4. The substring 'x'
  5. The empty string between x and d
  6. The empty string after d

Evidently (5) is not replaced, but I can't see why. If we removed the word "empty" from the bolded text above, I can see that (5) would not be replaced. But (5) is not adjacent to a previous empty match.


Solution

  • "... 3. The empty string between b and x ..."

    I don't believe there would be an empty string between b and x, since x matches.
    The pattern is effectively, "if x, 1 or more, or none".

    For example, the only reason it's an empty space between a and b is because b is not an x.

                 -----------------
    characters   | a | b | x | d |
                 -----------------
    indices      0   1   2   3   4
    
    step indices substring is x current string
    1 0 to 1 a false -abxd
    2 1 to 2 b false -a-bxd
    3 2 to 3 x true -a-b-d
    4 3 to 4 d false -a-b--d
    5 4 false -a-b--d-