I am trying to pass a list of valids identifiers to the parser. That is to say: I have a list with the identifiers and the parser should use them, I'm passing them as a parameter into the constructor.
Instead of identifiers = Literal('identifier1') | Literal('identifier2') | Literal('identifier whatever')
I have an array of identifiers identifiers = ['identifier1', 'identifier2', 'identifier whatever', ... 'identifier I can not what']
that I need to tell pyparsing to use as identifiers.
This is what I've done so far:
def __init__(self, idents):
if isinstance(idents, list) and idents:
for identifier in idents:
// and this is where I got stuck
// I tried:
// identifiers = Literal(identifier) but this keeps only the lastone
How can I achieve this?
The easiest way to convert a list of strings to a list of alternative parse expressions is to use oneOf
:
import pyparsing as pp
color_expr = pp.oneOf(["red", "orange", "yellow", "green", "blue", "purple"])
# for convenience could also write as pp.oneOf("red orange yellow green blue purple")
# but since you are working with a list, I am show code using a list
parsed_colors = pp.OneOrMore(color_expr).parseString("blue orange yellow purple green green")
# use pprint() to list out results because I am lazy
parsed_colors.pprint()
sum(color_expr.searchString("blue 1000 purple, red red swan okra kale 5000 yellow")).pprint()
Prints:
['blue', 'orange', 'yellow', 'purple', 'green', 'green']
['blue', 'purple', 'red', 'red', 'yellow']
So oneOf(["A", "B", "C"])
and the easy-button version oneOf("A B C")
are the same as Literal("A") | Literal("B") | Literal("C")
One thing to be careful of with oneOf
is that it does not enforce word boundaries
pp.OneOrMore(color_expr).parseString("redgreen reduce").pprint()
will print:
['red', 'green', 'red']
even though the initial 'red' and 'green' are not separate words, and the final 'red' is just the first part of 'reduce'. This is exactly the behavior you would get with using an explicit expression built up with Literal
s.
To enforce word boundaries, you must use the Keyword class, and now you have to use a bit more Python to build this up.
You will need to build up an Or or MatchFirst expression for your alternatives. Usually you build these up using '^' or '|' operators, respectively. But to create one of these using a list of expressions, then you would call the constructor form Or(expression_list)
or MatchFirst(expression_list)
.
If you have a list of strings, you could just create Or(list_of_identifiers)
, but this would default to converting the strings to Literals, and we've already seen you don't want that.
Instead, use your strings to create Keyword expressions using a Python list comprehension or generator expression, and pass that to the MatchFirst
constructor (MatchFirst will be more efficient than Or, and Keyword matching will be safe to use with MatchFirst's short-circuiting logic). The following will all work the same, with slight variations in how the sequence of Keywords is built and passed to the MatchFirst constructor:
# list comprehension
MatchFirst([Keyword(ident) for ident in list_of_identifiers])
# generator expression
MatchFirst(Keyword(ident) for ident in list_of_identifiers)
# map built-in
MatchFirst(map(Keyword, list_of_identifiers))
Here is the color matching example, redone using Keywords. Note how colors embedded in larger words are not matched now:
colors = ["red", "orange", "yellow", "green", "blue", "purple"]
color_expr = pp.MatchFirst(pp.Keyword(color) for color in colors)
sum(color_expr.searchString("redgreen reduce skyblue boredom purple 100")).pprint()
Prints:
['purple']