Trailing empty string after re.split()

I have two strings where I want to isolate sequences of digits from everything else.

For example:

import re
s = 'abc123abc'
print(re.split('(\d+)', s))
s = 'abc123abc123'
print(re.split('(\d+)', s))

The output looks like this:

['abc', '123', 'abc']
['abc', '123', 'abc', '123', '']

Note that in the second case, there's a trailing empty string.

Obviously I can test for that and remove it if necessary but it seems cumbersome and I wondered if the RE can be improved to account for this scenario.

Solution

You can use filter and don't return this empty string like below:

>>> s = 'abc123abc123'
>>> re.split('(\d+)', s)
['abc', '123', 'abc', '123', '']

>>> list(filter(None,re.split('(\d+)', s)))
['abc', '123', 'abc', '123']

By thanks @chepner you can generate list comprehension like below:

>>> [x for x in re.split('(\d+)', s) if x]
['abc', '123', 'abc', '123']

If maybe you have symbols or other you need split:

>>> s = '&^%123abc123$#@123'
>>> list(filter(None,re.split('(\d+)', s)))
['&^%', '123', 'abc', '123', '$#@', '123']