I want to parse :
'APPLE BANANA FOO TEST BAR'
into :
[['APPLE BANANA'], 'FOO', ['TEST BAR']]
Here is my latest attempt:
to_parse = 'APPLE BANANA FOO TEST BAR'
words = Word(alphas)
foo = Keyword("FOO")
parser = Group(ZeroOrMore(words + ~foo)) + foo + Group(ZeroOrMore(words))
result = parser.parseString(to_parse)
But it will return the following error:
> raise ParseException(instring, loc, self.errmsg, self)
E pyparsing.ParseException: Expected "FOO" (at char 6), (line:1, col:7)
I think that the problem comes from ZeroOrMore(words + ~foo))
which is "too greedy". According to few questions on SO, the solution is to use that negation with ~foo
, but it doesn't work in this case. Any help would be appreciated
You are definitely on the right track. You just need to do the negative lookahead of foo
before parsing a words
:
parser = Group(ZeroOrMore(~foo + words)) + foo + Group(ZeroOrMore(words))
In recent pyparsing releases, I added a stopOn
argument to ZeroOrMore
and OneOrMore
that does the same thing, to make this less error-prone:
parser = Group(ZeroOrMore(words, stopOn=foo)) + foo + Group(ZeroOrMore(words))
With this change I get:
>>> result.asList()
[['APPLE', 'BANANA'], 'FOO', ['TEST', 'BAR']]