Consider a string s = "aa,bb11,22 , 33 , 44,cc , dd "
.
I would like to split s
into the following list of tokens using the regular expressions module in Python, which is similar to the functionality offered by Perl:
"aa,bb11"
"22"
"33"
"44,cc , dd "
Note:
My best attempt so far is the following:
import re
pattern = r'(?<=\d)(\s*),(\s*)(?=\d)'
s = 'aa,bb11,22 , 33 , 44,cc , dd '
print re.compile(pattern).split(s)
but this prints:
['aa,bb11', '', '', '22', ' ', ' ', '33', ' ', ' ', '44,cc , dd ']
which is close to what I want, inasmuch as the 4 things I want are contained in the list. I could go through and get rid of any empty strings and any strings that consist of only spaces/commas, but I'd rather have a single line regex that does all this for me.
Any ideas?
Don't put capture groups on the \s*
:
pattern = r'(?<=\d)\s*,\s*(?=\d)'