Search code examples
c#antlr4antlrworks

How to use same word in different optional Lexer tokens


I simplified my Antlr4 grammar to this:

grammar test;

directives:
    ('[' directive ']')* EOF;


directive:      
      KEY1 OPERATOR OPTIONS1
    | KEY2 OPERATOR OPTIONS2;
OPERATOR: '=';

KEY1: 'Key1';
KEY2: 'Key2';

OPTIONS1: 'a'|'b'|'c';
OPTIONS2: 'c'|'d'|'e';

When I try to use this grammar to parse:

[Key1=a][Key2=c]

The Parser give an error:

line 1:14 mismatched input 'c' expecting OPTIONS2

In my real work, OPTIONS1 and OPTIONS2 are different enum data types, 'c' is the one in both.


Solution

  • You should split intersections:

    OPTIONS1: 'a'|'b'|'c';
    OPTIONS2: 'c'|'d'|'e';
    

    So, your rules will be:

    OPTIONS1: 'a'|'b';
    OPTIONS2: 'd'|'e';
    OPTIONS3: 'c';
    

    and:

    directive:      
          KEY1 OPERATOR (OPTIONS1 | OPTIONS3)
        | KEY2 OPERATOR (OPTIONS2 | OPTIONS3)
    

    This happens, because Lexer performs token identification from tree leafs, so, your 'c' interpreted by lexer as OPTIONS1 instead of OPTIONS2 by their order in grammar.

    I forgot how your tokens can be inlined (interpreted as macros), so it will look like this in preprocessor (it will work too):

    directive:      
          KEY1 OPERATOR ('a'|'b'|'c')
        | KEY2 OPERATOR ('c'|'d'|'e');
    

    You better to read their current syntax, it can be inlined. Drawback is that you will not see OPERATOR1 and OPERATOR2 in AST view.