I have an app that includes a 3 operator (& | !) boolean expression evaluator, with variables and constants. Generally the expressions aren't too long (perhaps 50 terms at the most, but usually a lot less). There can be very many expressions - I'm expecting the upper limit to be around a million. Currently I have a hand written parser with a very simple evaluator that simply recursively traverses the parse tree. One constraint is that this has to be callable from C++. I have no sharing between expressions. I'd like to investigate speeding this up.
I see two avenues of research.
Also I would expect that a code generation approach will be faster than an interpretive approach working on parse trees or similar structures. It would probably be fairly straightforward to generate some C++ code, but considering the length of the functions, I don't know if a compiler like GCC will be able to optimize the CSEs.
I've seen that there are a few libraries available for expression evaluation, but in my work environment adding 3rd party libraries is not simple plus they all seem very complicated compared to my needs.
Lastly I've been looking at Antlr4 a bit recently, so that might be appropriate for me. In the past I've worked on C code generation, but I have no experience of using something like LLVM for optimisation and code generation.
Any suggestions for which way to go?
As far as I understood, your question is more about faster expression evaluation than it about faster expression parsing. So my answer will focus on the former. Parsing, after all, should not be the bottleneck as your expression language looks simple enough to implement a manually tuned parser for it.
So, to accelerate your evaluations, you can consider JIT execution of your formulas using LLVM. That is, given your formula F
you can (relatively) easily generate corresponding LLVM IR and directly evaluate it. This SMT solver does just that. IR code generation is implemented in a single C++ class here.
Note that the boolean expressions you mentioned are a subset of the SMT language supported by that solver. Additionally, you can easily adjust how aggressive the LLVM optimizer needs to be.
However, IR generation and optimization has its overhead. Therefore, in case a given formula is not evaluated often enough to amortize the initial overhead, then I would recommend direct interpretation instead. You can look in this case for opportunities to find structural similarities and common subexpressions.