Search code examples
cantlrantlr3

Force ANTLR (version 3) to match lexer rule


I have the following ANTLR (version 3) grammar:

grammar GRM;


options
{
    language = C;
    output = AST;
}


create_statement : CREATE_KEYWORD SPACE_KEYWORD FILE_KEYWORD SPACE_KEYWORD value -> ^(value);

value : NUMBER | STRING;


CREATE_KEYWORD : 'CREATE';

FILE_KEYWORD : 'FILE';

SPACE_KEYWORD : ' ';


NUMBER : DIGIT+;

STRING : (LETTER | DIGIT)+;


fragment DIGIT : '0'..'9';

fragment LETTER : 'a'..'z' | 'A'..'Z';

With this grammar, I am able to successfully parse strings like CREATE FILE dump or CREATE FILE output. However, when I try to parse a string like CREATE FILE file it doesn't work. ANTLR matches the text file (in the string) with lexer rule FILE_KEYWORD which is not the match that I was expecting. I was expecting it to match with lexer rule STRING.

How can I force ANTLR to do this?


Solution

  • Your problem is a variant on classic contextual keyword vs identifier issue, it seems.

    Either "value" should be a lexer rule, not a parser rule, it's too late otherwise, or you should reorder the rules (or both).

    Hence using VALUE = NUMBER | STRING (lexer rule) instead of lower case value (grammar rule) will help. The order of the lexer rules are also important, usually definition of ID ("VALUE" in your code) comes after keyword definitions.

    See also : 'IDENTIFIER' rule also consumes keyword in ANTLR Lexer grammar

    grammar GMR;
    
    
    options
    {
        language = C;
        output = AST;
    }
    
    
    create_statement : CREATE_KEYWORD SPACE_KEYWORD FILE_KEYWORD SPACE_KEYWORD value -> ^(value);
    
    
    CREATE_KEYWORD : 'CREATE';
    
    FILE_KEYWORD : 'FILE';
    
    value : (LETTER | DIGIT) + | FILE_KEYWORD | CREATE_KEYWORD  ;
    
    SPACE_KEYWORD : ' ';
    

    this works for me in ANTLRworks for input CREATE FILE file and for input CREATE FILE FILE if needed.