Search code examples
pythonpython-re

Trailing empty string after re.split()


I have two strings where I want to isolate sequences of digits from everything else.

For example:

import re
s = 'abc123abc'
print(re.split('(\d+)', s))
s = 'abc123abc123'
print(re.split('(\d+)', s))

The output looks like this:

['abc', '123', 'abc']
['abc', '123', 'abc', '123', '']

Note that in the second case, there's a trailing empty string.

Obviously I can test for that and remove it if necessary but it seems cumbersome and I wondered if the RE can be improved to account for this scenario.


Solution

  • You can use filter and don't return this empty string like below:

    >>> s = 'abc123abc123'
    >>> re.split('(\d+)', s)
    ['abc', '123', 'abc', '123', '']
    
    >>> list(filter(None,re.split('(\d+)', s)))
    ['abc', '123', 'abc', '123']
    

    By thanks @chepner you can generate list comprehension like below:

    >>> [x for x in re.split('(\d+)', s) if x]
    ['abc', '123', 'abc', '123']
    

    If maybe you have symbols or other you need split:

    >>> s = '&^%123abc123$#@123'
    >>> list(filter(None,re.split('(\d+)', s)))
    ['&^%', '123', 'abc', '123', '$#@', '123']