Search code examples
antlrantlr4

Generating java code using antlr input file


The source request rules described using *.g antlr files.

I'm trying to generate java code using antlr4 and getting errors like:

error(50): mql2.g4:9:7: syntax error: mismatched input ';' expecting RBRACE
error(50): mql2.g4:10:6: syntax error: mismatched input ';' expecting COLON while matching a lexer rule
error(50): mql2.g4:11:11: syntax error: mismatched input ';' expecting COLON while matching a lexer rule
error(50): mql2.g4:12:10: syntax error: mismatched input ';' expecting COLON while matching a lexer rule
error(50): mql2.g4:16:16: syntax error: '{package com.proquest.mql.queryTranslator;}' came as a complete surprise to me while matching rule preamble
error(50): mql2.g4:17:1: syntax error: 'lexer' came as a complete surprise to me while looking for an identifier
error(50): mql2.g4:19:11: syntax error: '^' came as a complete surprise to me
error(50): mql2.g4:19:16: syntax error: '!' came as a complete surprise to me
...

the input file example is

grammar mql2;

options {
    output=AST;
    k=2;
}

tokens {
    AND_OP;
    OR_OP;
    FIELD_CODE;
    FC_SUFFIX;
    }


@parser::header {package com.company.mql.queryTranslator;}
@lexer::header {package com.company.mql.queryTranslator;}

parse   : mql^ EOF!
    ;

mql : WS!* mqlx WS!* ( and_or^ WS! mqlx WS!*)*;

and_or
    : and_operator
    | or_operator
    ;

mqlx : search_item
     | LPAREN! mql^ RPAREN!
     | field_code field_phrase RPAREN -> ^(FIELD_CODE field_code field_phrase)
     | field_code_prefix field_code_suffix ->^(FIELD_CODE field_code_prefix field_code_suffix)
     ;

field_code
    : w=WORD^ LPAREN!
    ;

field_phrase
    : (WS!* (WORD|PHRASE|AND|OR))+
    ;

field_code_prefix
    : WORD^ '.'!;

field_code_suffix
    : field_code  (WORD|PHRASE) RPAREN!;

and_operator
    : AND->AND_OP | (/*empty*/->AND_OP) ;

or_operator
    : OR->OR_OP;

search_item
        :  NOT^ WS!* mqlx
    |  (WORD|PHRASE);

LPAREN : '(';

RPAREN : ')';

AND : ('a'|'A')('n'|'N')('d'|'D');

NOT : ('n'|'N')('o'|'O')('t'|'T');

OR : ('o'|'O')('r'|'R');

fragment
DIGIT  : ('0'..'9') ;

fragment
LETTER  : ('a'..'z' | 'A'..'Z'| 'á'| '*' | '&' | '-' | '.' | ',' | '?' | '!' | '/' | '\u0080'..'\ufffe');

SPECIAL_CHAR : ('\'' | '&');

WORD    : (LETTER|DIGIT|SPECIAL_CHAR)+;

WS : ( '\t' | ' ' | '\r' | '\n' | '|')+ /*{ $channel = HIDDEN; }*/;

fragment
QUOTE :   '"' ;

PHRASE  :   QUOTE (options {greedy=false;} : . )* QUOTE ;

So the questions are:

  • Which version of antlr the file should be supported by? (I started reading antlr4 reference, and continued with antlr3 on their confluence but not realized yet the current version)
  • How to fix for antlr4 errors like a syntax error: '^' came as a complete surprise to me or syntax error: '!' came as a complete surprise to me?

Solution

  • As mentioned in the comments: the grammar is for ANTLR3. I recommend stop using v3 grammars: it's rather old. Converting it into a v4 grammar is easy:

    grammar mql2;
    
    @parser::header {package com.company.mql.queryTranslator;}
    @lexer::header {package com.company.mql.queryTranslator;}
    
    parse   : mql EOF
        ;
    
    mql : mqlx (and_or mqlx)*;
    
    and_or
        : and_operator
        | or_operator
        ;
    
    mqlx : search_item
         | LPAREN mql RPAREN
         | field_code field_phrase RPAREN
         | field_code_prefix field_code_suffix
         ;
    
    field_code
        : WORD LPAREN
        ;
    
    field_phrase
        : (WORD|PHRASE|AND|OR)+
        ;
    
    field_code_prefix
        : WORD '.';
    
    field_code_suffix
        : field_code  (WORD|PHRASE) RPAREN;
    
    and_operator
        : AND;
    
    or_operator
        : OR;
    
    search_item
            :  NOT mqlx
        |  (WORD|PHRASE);
    
    LPAREN : '(';
    
    RPAREN : ')';
    
    AND : [aA] [nN] [dD];
    
    NOT : [nN] [oO] [tT];
    
    OR : [oO] [rR];
    
    fragment
    DIGIT  : [0-9];
    
    fragment
    LETTER  : [a-zA-Zá*&\-.,?!/\u0080-\ufffe];
    
    fragment
    SPECIAL_CHAR : ('\'' | '&');
    
    WORD    : (LETTER|DIGIT|SPECIAL_CHAR)+;
    
    WS : [\t \r\n|]+ -> channel(HIDDEN);
    
    fragment
    QUOTE :   '"' ;
    
    PHRASE  :   QUOTE .*? QUOTE ;
    

    Version 4 just gives you a parse tree which you cannot transform (into an AST) in the grammar as was possible in version 3.