Search code examples
antlrantlr4antlrworksantlr2

Custom error handler methods fail to handle token recognition errors


Here is my .g4 file:

grammar Hello;

start : compilation;
compilation : sql*;
sql : altercommand;
altercommand : ALTER TABLE SEMICOLON;
ALTER: 'alter';
TABLE: 'table';
SEMICOLON : ';';

My main class:

public class Main {
    public static void main(String[] args) throws IOException {
        ANTLRInputStream ip = new ANTLRInputStream("altasdere table ; alter table ;");
        HelloLexer lex = new HelloLexer(ip);
        CommonTokenStream token = new CommonTokenStream(lex);
        HelloParser parser = new HelloParser(token);

        parser.setErrorHandler(new CustomeErrorHandler());

        System.out.println(parser.start().toStringTree(parser));            
    }    
}

My CutomErrorHandler class:

public class CustomeErrorHandler extends DefaultErrorStrategy {

    @Override
    public void recover(Parser recognizer, RecognitionException e) {
        super.recover(recognizer, e);
        TokenStream tokenStream = (TokenStream) recognizer.getInputStream();

        if (tokenStream.LA(1) == HelloParser.SEMICOLON) {
            IntervalSet intervalSet = getErrorRecoverySet(recognizer);
            tokenStream.consume();
            consumeUntil(recognizer, intervalSet);
        }
     }
 }

When I give input altasdere table ; alter table ; it wont parse the second command as it has found the error in first one. The output of my main class is

line 1:0 token recognition error at: 'alta'
line 1:4 token recognition error at: 's'
line 1:5 token recognition error at: 'd'
line 1:6 token recognition error at: 'e'
line 1:7 token recognition error at: 'r'
line 1:8 token recognition error at: 'e'
line 1:9 token recognition error at: ' '
(start compilation)

Solution

  • In The Definitive ANTLR 4 Reference, section 9.5 Altering ANTLR’s Error Handling Strategy, I can read :

    The default error handling mechanism works very well, but there are a few atypical situations in which we might want to alter it.

    Is your grammar so atypical that you need to process token recognition error ? Personally I would write a grammar which is free of errors at the Lexer level, like the following.

    File Question.g4 :

    grammar Question;
    
    question
    @init {System.out.println("Question last update 0712");}
        :   sql+ EOF
        ;
    
    sql
        :   alter_command
        |   erroneous_command
        ;
    
    alter_command
        :   ALTER TABLE SEMICOLON
            {System.out.println("Alter command found : " + $text);}
        ;
    
    erroneous_command
        :   WORD TABLE? SEMICOLON
            {System.out.println("Erroneous command found : " + $text);}
        ;
    
    ALTER     : 'alter' ;
    TABLE     : 'table' ;
    WORD      : [a-z]+ ;
    SEMICOLON : ';' ;
    WS        : [ \t\r\n]+ -> channel(HIDDEN) ;
    

    Note that the WORD rule must come after ALTER, see disambiguate or here.

    File t.text :

    altasdere table ; alter table ;
    

    Execution :

    $ grun Question question -tokens -diagnostics t.text
    [@0,0:8='altasdere',<WORD>,1:0]
    [@1,9:9=' ',<WS>,channel=1,1:9]
    [@2,10:14='table',<'table'>,1:10]
    [@3,15:15=' ',<WS>,channel=1,1:15]
    [@4,16:16=';',<';'>,1:16]
    [@5,17:17=' ',<WS>,channel=1,1:17]
    [@6,18:22='alter',<'alter'>,1:18]
    [@7,23:23=' ',<WS>,channel=1,1:23]
    [@8,24:28='table',<'table'>,1:24]
    [@9,29:29=' ',<WS>,channel=1,1:29]
    [@10,30:30=';',<';'>,1:30]
    [@11,31:31='\n',<WS>,channel=1,1:31]
    [@12,32:31='<EOF>',<EOF>,2:0]
    Question last update 0712
    Erroneous command found : altasdere table ;
    Alter command found : alter table ;
    

    As you can see, the erroneous input has been absorbed by the WORD token. Now it should be easy to process or ignore the erroneous command in the listener/visitor.