With pyparsing
I need to write a matcher for expressions like
a + names + c
with
a = pp.OneOrMore(pp.Word(pp.alphas))
c = pp.OneOrMore(pp.Word(pp.nums))
and names
matching one of many entries in the string list names_list
.
The two complications are:
names_list
can contain spaces.names_list
is rather large (~20000 entries)I tried
names_kw_list = [pp.Keyword(name, caseless=True) for name in names_list ]
names = pp.Or(names_kw_list)
This does not work for entries with spaces plus I'm worried that this is not a very performant way to write this.
Any idea to get this working for spaces in entries and maybe make it perform faster?
A partial answer:
The spaces problem can be solved with a correct stopOn
function:
def last_occurrence_of(expr):
return expr + ~pp.FollowedBy(pp.SkipTo(expr))
names_kw_list = [pp.Keyword(word, caseless=True)
for word in names_list ]
names = pp.Or(names_kw_list)("names")
a = pp.OneOrMore(pp.Word(pp.alphas), stopOn=last_occurrence_of(names))("A")
c = pp.OneOrMore(pp.Word(pp.nums))("C")
expr = a + names + c
This instructs a
not to eat into the strings of names
.
However the performance deteriorates, because now the long list of names is used in a stopOn
expression.