Search code examples
compiler-constructionantlrantlr4antlr3

Antlr4 parser ignoring lexer rules and generating implicit tokens


As a simple example, let's say I have the following.

parse_rule0 : parse_rule1 ',' parse_rule2 ';' ;

pare_rule1 : ... ;

parse_rule2: ... ;

PUNCTUATION : ',' | '.'| ';' | ':' ;

When Antlr4 (specifically the antlr4 vscode exentions) goes to generate tokens, it ignores my punctuation rule (just an example) and creates an implicit token, T_1 , for example. I can't seem to find any resources online for looking for a specific token out of a more general lexical rule. It seems senseless to create a lexical rule for every possible literal you might want to look for while parsing.

I've searched high and low for a solution to this problem. If I want to parse for a specific character that's defined in a lexical rule, how do I prevent Antlr from just generating an implicit token, and actually read my lexical rules?


Solution

  • Explicit use of token

    parse_rule0 : parse_rule1 PUNCTUATION parse_rule2 PUNCTUATION ;
    
    pare_rule1 : ... ;
    
    parse_rule2: ... ;
    
    PUNCTUATION : ',' | '.'| ';' | ':' ;
    

    Lexical rule for every literal

    parse_rule0 : parse_rule1 ',' parse_rule2 ';' ;
    
    pare_rule1 : ... ;
    
    parse_rule2: ... ;
    
    COMMA: ',';
    DOT: '.';
    SEMI: ';';
    COLON: ':';
    

    Unfortunately, ANTLR can't match literals with complex lexer rules that include even only alternatives.