Search code examples
antlrantlr4ebnf

ANTLR4 Token is not recognized when substituted


I try to modify the grammar of the sqlite syntax (I'm interested in a variant of the where clause only) and I'm keep having a weird error when substituting AND to it's own token.

grammar wtfql;

/*
    SQLite understands the following binary operators, in order from highest to
    lowest precedence:

    ||
    *    /    %
    +    -
    <<   >>   &    |
    <    <=   >    >=
    =    !=   <>   IS   IS NOT   IN   LIKE   GLOB   MATCH   REGEXP
    AND
    OR
*/

start : expr EOF?;

expr
 : literal_value 
  //BIND_PARAMETER
 | ( table_name '.' )? column_name
 | unary_operator expr
 | expr '||' expr
 | expr ( '*' | '/' | '%' ) expr
 | expr ( '+' | '-' ) expr
 | expr ( '<' | '<=' | '>' | '>=' ) expr
 | expr ( '=' | '<>' | K_IN ) expr
 | expr K_AND expr
 | expr K_OR expr
 | function_name '(' ( expr ( ',' expr )* )? ')'
 | '(' expr ')'
 | expr K_NOT  expr
 | expr ( K_NOT K_NULL )
 | expr K_NOT? K_IN ( '(' ( expr ( ',' expr )* ) ')' )
 ;


unary_operator
 : '-'
 | '+'
 | K_NOT
 ;

literal_value
 : NUMERIC_LITERAL
 | STRING_LITERAL
 | K_NULL
 ;

function_name
 : IDENTIFIER
 ;

table_name 
 : any_name
 ;

column_name 
 : any_name
 ;

any_name
 : IDENTIFIER 
 | keyword
// | '(' any_name ')'
 ;

keyword
 : K_AND 
 | K_NOT 
 | K_NULL 
 | K_IN
 | K_OR
 ;

IDENTIFIER
 : [a-zA-Z_] [a-zA-Z_0-9]* // TODO check: needs more chars in set
 ;

NUMERIC_LITERAL
 : DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
 | '.' DIGIT+ ( E [-+]? DIGIT+ )?
 ;

STRING_LITERAL
 : '\"' ( ~'\"' | '\"\"' )* '\"'
 ;

SPACES
 : [ \u000B\t\r\n] -> channel(HIDDEN)
 ;

DOT : '.';
OPEN_PAR : '(';
CLOSE_PAR : ')';
COMMA : ',';
STAR : '*';
PLUS : '+';
MINUS : '-';
TILDE : '~';
DIV : '/';
MOD : '%';
AMP : '&';
PIPE : '|';
LT : '<';
LT_EQ : '<=';
GT : '>';
GT_EQ : '>=';
EQ : '=';
NOT_EQ2 : '<>';

K_AND : A N D;
K_NOT : N O T;
K_NULL : N U L L;
K_OR : O R;
K_IN : I N;

fragment DIGIT : [0-9];

fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

writing

 | expr K_AND expr

with the input

field1=1 and field2 = 2

results in

line 1:8 mismatched input 'and' expecting {<EOF>, '||', '*', '+', '-', '/', '%', '<', '<=', '>', '>=', '=', '<>', K_AND, K_NOT, K_OR, K_IN}

while

 | expr 'and' expr

works like a charm:

$ antlr4 wtfql.g4 && javac -classpath /usr/local/Cellar/antlr/4.4/antlr-4.4-complete.jar  wtfql*.java && cat test.txt | grun wtfql start -tree -gui

(start (expr (expr (expr (column_name (any_name feld1))) = (expr (literal_value 1))) and (expr (expr (column_name (any_name feld2))) = (expr (literal_value 2)))) <EOF>)

What am I missing?


Solution

  • I presume "and" is an IDENTIFIER since the rule for IDENTIFIER comes before the rule for AND and thus wins.

    If you write 'and' in the parser rule this implicitly creates a token (not AND!) which comes before IDENTIFIER and thus wins.

    Rule of thumb: More specific lexer rules first. Don't create new lexer tokens implicitly in parser rules.

    If you check the token type, you'll get a clue what's going on.