My implementation of infixNotation
is running slower than I would like even after using enablePackrat
, which greatly increased performance.
Parsing needs to recognize and parse the following types of strings:
prefix::dotted.alphanum.string -> [prefix::dotted.alphanum.string]
pow(some::var + 2.3, 5) -> [pow, [[some::var, +, 2.3], 5]]
The code I'm using:
def parse_expression(expr_str):
fraction = Combine("." + Word(nums))
number = Combine(Word(nums) + Optional(fraction)).setParseAction(str_to_num)
event_id_expr = Word(alphanums + "_") + "::"
dotted_columns = Combine(Word(alphanums + "_") + Optional("."))
column_expr = Combine(event_id_expr + OneOrMore(dotted_columns))
arith_expr = infixNotation(column_expr | number, [
(Word(alphanums + "_"), 1, opAssoc.RIGHT),
("-", 1, opAssoc.RIGHT),
(oneOf("* /"), 2, opAssoc.LEFT),
(oneOf("+ -"), 2, opAssoc.LEFT),
(Literal(","), 2, opAssoc.LEFT)
])
parsed_expr = arith_expr.parseString(expr_str).asList()[0]
return parsed_expr
def str_to_num(t):
num_str = t[0]
try:
return int(num_str)
except ValueError:
return float(num_str)
Are there any changes I can make that would result in substantial performance improvements? The structures I'm parsing are fairly simple, but they're in batches. On average each string is taking ~5.3ms.
It looks like you are "fudging" the functions as if they are operators, I think you are better off moving function calls into the operand expression for infixNotation
:
def parse_expression(expr_str):
number = pyparsing_common.number()
event_id_expr = Word(alphas+"_", alphanums + "_") + "::"
dotted_columns = Combine(Word(alphas+"_", alphanums + "_") + Optional("."))
column_expr = Combine(event_id_expr + OneOrMore(dotted_columns))
func_name = Word(alphas+"_", alphanums+'_')
LPAR, RPAR = map(Suppress, "()")
arith_expr = Forward()
func_call = Group(func_name('name')
+ LPAR
+ Group(Optional(delimitedList(arith_expr)))("args")
+ RPAR)
arith_expr <<= infixNotation(number | func_call | column_expr, [
("-", 1, opAssoc.RIGHT),
(oneOf("* /"), 2, opAssoc.LEFT),
(oneOf("+ -"), 2, opAssoc.LEFT),
])
parsed_expr = arith_expr.parseString(expr_str)[0]
return parsed_expr
I also modified most of your identifiers to use the two-argument form of Word - just using Word(alphanums+"_")
would also match ordinary integers, which I don't think is your intent. If I got this wrong, then just put these back as you had them.