Search code examples
antlrantlrworks

Ignored underscore letter


If i try to run "___sad" in the interpreter for the following grammar

grammar identTest;
options       

{   
    language = Java;
    output=AST;
}


goal: identifier;

fragment Letter: (('a'..'z') | ('A'..'Z'));
fragment Digit : '0' .. '9';
identifier :IDENTIFIER;


IDENTIFIER: Letter+;
WS:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

Interpreter output: interpreter Debugger output: debugger

Interpreter includes underscore letter and debugger seems just ignores it! I expect to get some kind of exception in this case (since only 'A'-'z' letters are defined in the grammar). What is wrong with my grammar?


Solution

  • Don't use the interpreter: it's buggy.

    Using the debugger you can view the warnings/errors/exceptions your parser produces after pressing the Output button (lower left corner). When doing so, you will see the following:

    .../__Test___input.txt line 1:0 no viable alternative at character '_'
    .../__Test___input.txt line 1:1 no viable alternative at character '_'
    .../__Test___input.txt line 1:2 no viable alternative at character '_'
    

    The parser simply recovers from the underscores and continues parsing.

    If you don't want your lexer to recover from such no viable alternative warnings, simply create a fall through lexer rule (called OTHER) and throw an exception from it:

    grammar identTest;
    
    options       
    {   
        language = Java;
        output=AST;
    }
    
    
    goal       : identifier;
    identifier : IDENTIFIER;
    
    IDENTIFIER : Letter+;
    WS         : (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
    OTHER      : . {throw new RuntimeException("unknown char: '" + $text + "'");};
    
    fragment Letter : (('a'..'z') | ('A'..'Z'));
    fragment Digit  : '0' .. '9';