python regex regex-greedy regex-lookarounds lookbehind

Python (Perl-type) regex lookahead/lookbehind

Consider a string s = "aa,bb11,22 , 33 , 44,cc , dd ".

I would like to split s into the following list of tokens using the regular expressions module in Python, which is similar to the functionality offered by Perl:

"aa,bb11"
"22"
"33"
"44,cc , dd "

Note:

I want to tokenise on commas, but only if those commas have numbers to either side.
Any (optional) whitespace around these "numerical commas" that I'm targeting should be removed in the result. The optional whitespace may be more than a single space.
Any other whitespace should be left as it appears in the original string.

My best attempt so far is the following:

import re

pattern = r'(?<=\d)(\s*),(\s*)(?=\d)'
s = 'aa,bb11,22 , 33 , 44,cc , dd '

print re.compile(pattern).split(s)

but this prints:

['aa,bb11', '', '', '22', ' ', ' ', '33', ' ', ' ', '44,cc , dd ']

which is close to what I want, inasmuch as the 4 things I want are contained in the list. I could go through and get rid of any empty strings and any strings that consist of only spaces/commas, but I'd rather have a single line regex that does all this for me.

Any ideas?

Solution

Don't put capture groups on the \s*:

pattern = r'(?<=\d)\s*,\s*(?=\d)'