Search code examples
antlr4

How to generate a parse error after exactly one expression


I have following ANTLR grammar:

start :
    expression
;

expression
    :
    | dateOperatorExpression
    | numberOperatorExpression
    | stringOperatorExpression
    | methodBooleanExpression
    | doubleMethodOperatorExpression
    | numberInExpression
    | stringInExpression
    | bracketExpression
    | andExpression
    | orExpression
    | notExpression
    ;

numberInExpression:
    | WS? METHOD WS? IN WS? '{' WS? NUMBER (WS? ',' WS? NUMBER)* '}' WS?
    ;

stringInExpression:
    | WS? METHOD WS? IN WS? '{' WS? STRING (WS? ',' WS? STRING)* '}' WS?
    ;

dateOperatorExpression:
    | WS? DATE WS? OPERATOR WS? DATE WS?
    | WS? DATE WS? OPERATOR WS? METHOD WS?
    | WS? METHOD WS? OPERATOR WS? DATE WS?
    | WS? DATE WS? OPERATOR WS? NULLVALUE WS?
    | WS? NULLVALUE WS? OPERATOR WS? DATE WS?
    ;
numberOperatorExpression:
    | WS? NUMBER WS? OPERATOR WS? NUMBER WS?
    | WS? NUMBER WS? OPERATOR WS? METHOD WS?
    | WS? METHOD WS? OPERATOR WS? NUMBER WS?
    | WS? NUMBER WS? OPERATOR WS? NULLVALUE WS?
    | WS? NULLVALUE WS? OPERATOR WS? NUMBER WS?
    ;
stringOperatorExpression:
    | WS? STRING WS? OPERATOR WS? STRING WS?
    | WS? STRING WS? OPERATOR WS? METHOD WS?
    | WS? METHOD WS? OPERATOR WS? STRING WS?
    | WS? STRING WS? OPERATOR WS? NULLVALUE WS?
    | WS? NULLVALUE WS? OPERATOR WS? STRING WS?
    ;
doubleMethodOperatorExpression:
    | WS? METHOD WS? OPERATOR WS? METHOD WS?
    | WS? METHOD WS? OPERATOR WS? NULLVALUE WS?
    | WS? NULLVALUE WS? OPERATOR WS? METHOD WS?
    ;
methodBooleanExpression:
    | WS? METHOD WS? OPERATOR WS? BOOLEAN WS?
    | WS? BOOLEAN WS? OPERATOR WS? METHOD WS?
    ;

bracketExpression:
    | '(' WS? expression WS? ')'
    ;
andExpression
    :
    |  AND WS? '(' expression (',' expression)* WS? ')'
    ;
orExpression
    :
    | OR WS? '(' expression (',' expression)* WS? ')'
    ;
notExpression
    :
    | NOT WS? expression
    ;

WS: (' ' | '\t' | '\r' | '\n')+ -> skip;
AND: 'AND' | 'and';
OR: 'OR' | 'or';
NOT: 'NOT' | 'not' | '!';
IN: 'IN' | 'in';

OPERATOR: '==' | '!=' | '>' | '<' | '>=' | '<=' | 'ILIKE' | 'ilike' | 'LIKE' | 'like';

NULLVALUE: 'null' | 'NULL';
BOOLEAN: 'true' | 'false' | 'TRUE' | 'FALSE';
METHOD: [a-zA-Z_][a-zA-Z0-9_.]*;
NUMBER: [0-9.]+;
STRING: '"' [a-zA-Z0-9%]+ '"';
DATE: [0-9][0-9][0-9][0-9][-][0-9][0-9][-][0-9][0-9]('T'[0-9][0-9]':'[0-9][0-9]':'[0-9][0-9])?;

It allows users to specify a kind of query/predicate such as:

and((retired == true), or ((age >= 25), not(father.address.street != null)), firstName in {"John", "Pete"})

I would like to protect the user against typo's. If the user accidentally adds an additional bracket as follows:

and((retired == true)), or ((age >= 25), not(father.address.street != null)), firstName in {"John", "Pete"})

The parser only takes and((retired == true)) and ignores the rest. So, I thought about simply changing the grammar as follows, and then printing a warning of multiple expressions were parsed:

start :
   (expression)+
;

But this gives the error: "rule start contains a closure with at least one alternative that can match an empty string". Why is that? How can (expression)+ match an empty string, if expression can't?

How can I achieve what I want? Thanks,


Solution

  • The parser only takes and((retired == true)) and ignores the rest.

    Yes, by default ANTLR is perfectly happy to just parse a prefix of the input and leave the rest of it in the input stream. The way to prevent that is to add EOF to the end of the start rule, which makes it report an error if it can't parse the entirety of the input.

    How can (expression)+ match an empty string, if expression can't?

    What the error message is telling you is that expression can match an empty string. It's saying that you have a loop ("closure") whose contents (i.e. expression) can match the empty string. The reason that this is an error is that, if it weren't, it could lead to an infinite loop that keeps matching the empty string.

    And the reason that expression can match the empty string is that it starts with an empty alternative (: |).