I have
s = '10001001110100000'
I want to extract all matches ('0'
s between '1'
s including the '1'
s from the edges). The result should be [10001, 1001, 101]
for this example.
I coded a simple expression using PyParsing but I'm surprised how difficult it is to find a solution since PyParsing is returning the first match only.
My code so far:
from pyparsing import Group, OneOrMore, ZeroOrMore
s = '10001001110100000'
expr = ('1' + OneOrMore('0') + '1')
rule = ZeroOrMore(Group(expr))
print(rule.parseString(str).asList())
Which yields:
[['1', '0', '0', '0', '1']]
Expected result:
['10001', '1001', '101']
How to get other matches?
This question is specific to PyParsing.
A naive approach is to loop and keep track of the last "1"
as you move through the list:
s = '10001001110100000'
res = []
last_i = s.find('1')
for i in range(last_i, len(s)):
if s[i] == '1':
if i - last_i > 1:
res.append(s[last_i:i+1])
last_i = i
print(res) # => ['10001', '1001', '101']
Regex is not suitable for tasks such as this because the matches overlap but PyParsing
appears to have an overlap option on the ParserElement#scanString
method:
from pyparsing import Group, OneOrMore, ZeroOrMore
s = '10001001110100000'
rule = ZeroOrMore(Group(('1' + OneOrMore('0') + '1')))
print(list(rule.scanString(s, overlap=True)))