I thought I understood pyparsing's logic, but cannot figure out why the bottom example is failing.
I'm trying to parse open text comments where a product or set of products can be mentioned either in the beginning or the end of the comment. Product names can also be omitted from the comment.
The output should be a list of the mentioned products and the description regarding them.
Below are some test cases. The parse is identifying everything as 'description' instead of first picking up the products (isn't that what the negative is supposed to do?)
What's wrong in my understanding?
import pyparsing as pp
products_list = ['aaa', 'bbb', 'ccc']
products = pp.OneOrMore(' '.join(products_list))
word = ~products + pp.Word(pp.alphas)
description = pp.OneOrMore(word)
comment_expr = (pp.Optional(products("product1")) + description("description") + pp.Optional(products("product2")))
matches = comment_expr.scanString("""\
aaa is a good product
I prefer aaa
No comment
aaa bbb are both good products""")
for match in matches:
print match
The expected results would be:
product1: aaa, description: is a good product
product2: aaa, description: I prefer
description: No comment
product1: [aaa, bbb] description: are both good products
Pyparsing's shortcut equivalence between strings and Literals is intended to be a convenience, but sometimes it results in unexpected and unwanted circumstances. In these lines:
products_list = ['aaa', 'bbb', 'ccc']
products = pp.OneOrMore(' '.join(products_list))
. I'm pretty sure you wanted product to match on any product. But instead, OneOrMore gets passed this as its argument:
' '.join(products_list)
This is purely a string expression, resulting in the string "aaa bbb ccc". Passing this to OneOrMore, you are saying that products is one or more instances of the string "aaa bbb ccc".
To get the lookahead, you need to change products to:
products = pp.oneOf(products_list)
or even better:
products = pp.MatchFirst(pp.Keyword(p) for p in products_list)
Then your negative lookahead will work better.