Search code examples
pythonpython-3.xpyparsing

Retrieve several overlapping matches from a string in PyParsing


I have

s = '10001001110100000'

I want to extract all matches ('0's between '1's including the '1's from the edges). The result should be [10001, 1001, 101] for this example.

I coded a simple expression using PyParsing but I'm surprised how difficult it is to find a solution since PyParsing is returning the first match only.

My code so far:

from pyparsing import Group, OneOrMore, ZeroOrMore

s = '10001001110100000'
expr = ('1' + OneOrMore('0') + '1')
rule = ZeroOrMore(Group(expr))
print(rule.parseString(str).asList())

Which yields:

[['1', '0', '0', '0', '1']]

Expected result:

['10001', '1001', '101']

How to get other matches?

This question is specific to PyParsing.


Solution

  • A naive approach is to loop and keep track of the last "1" as you move through the list:

    s = '10001001110100000'
    res = []
    last_i = s.find('1')
    
    for i in range(last_i, len(s)):
        if s[i] == '1':
            if i - last_i > 1:
                res.append(s[last_i:i+1])
    
            last_i = i
    
    print(res) # => ['10001', '1001', '101']
    

    Regex is not suitable for tasks such as this because the matches overlap but PyParsing appears to have an overlap option on the ParserElement#scanString method:

    from pyparsing import Group, OneOrMore, ZeroOrMore
    
    s = '10001001110100000'
    rule = ZeroOrMore(Group(('1' + OneOrMore('0') + '1')))
    print(list(rule.scanString(s, overlap=True)))