Search code examples
pythonsplitpython-re

re.split on an empty string


I'm curious about the result of this Python code that does a split on an empty string '':

import re

x = re.split(r'\W*', '')
y = re.split(r'(\W*)', '')

Since the string is an empty string, I expect the result for x = re.split(r'\W*', '') is an empty list and that for y = re.split(r'(\W*)', '') is [''].

The actual result for x = re.split(r'\W*','')is ['',''] and that for y = re.split(r'(\W*)','') is ['','',''].

I don't know what leads to these results.


Solution

  • Note that the regular expression \W* can match an empty string. Thus, while it's not useful, it's true that the empty string can be split in half to produce an empty string:

    '' = '' + '' + ''
    
    1. The '' that precedes the regular expression
    2. The '' that matches the regular expression
    3. The '' that follows the regular expression

    In the first case, you get strings 1 and 3.

    In the second case, you also get string 2.

    (In general, it's probably never a good idea to use a regular expression that can mach the empty string as the first argument.)