Search code examples
antlr4

Parsing if-then-else takes longer and longer the more clauses that exists


Our formula grammar supports an if-then-else construct. Here is a snippet of the grammar:

parse
 : block EOF
 ;

block
 : statement*
 ;

statement
 : if_statement
 | setfunctions
 | blockcomment
 | comment
 ;

if_statement
 : comment* IF condition_block (ELSE IF condition_block)* (ELSE statement_block)?
 ;

condition_block
 : expression statement_block
 ;

statement_block
 : OBRACE (block | setfunctions*) CBRACE
 ;

setfunctions
 : setsumtags
 | setavgtags
 ;

The formula is by users to set calculated values for tags (objects) in our system. When we evaluate the formula, we do the following:

  1. Create the lexer.
  2. Create the parser.
  3. Create the visitor.
  4. Create the parse tree by calling parse() on the parser.
  5. Call Visit on the visitor with the parse tree.

I have noticed that the more else-if clauses, the slower step 4 gets.

E.g. creating the parser for a script with

if (....)
{
    SETSUMTAGS(....)
}
else if (....)
{
    SETSUMTAGS(....)
}

is twice as fast as a script with two additional else if. Although I did notice it is not linear (which would make sense) and the reason for the question.

Is that to be expected? Are we doing something wrong?

This affects our system startup time, and we have 100s or even 1000s of formulas that need evaluating.

If this is to be expected, is there a way to save/load a precompiled version of the parse tree? We could then save the compiled version at script creation time, and load it at run-time.


Solution

  • We ended up using a caching scheme so that all tags with the same script share the same parser, lexer and visitor.