Search code examples
parsingantlrantlr4lexerambiguous

Resolving ANTLR ambiguity while matching specific Types


I'm starting exploring ANTLR and I'm trying to match this format: (test123 A0020 )

Where :

  • test123 is an Identifier of max 10 characters ( letters and digits )
  • A : Time indicator ( for Am or Pm ), one letter can be either "A" or "P"
  • 0020 : 4 digit format representing the time.

I tried this grammar :

    IDENTIFIER
:
    ( LETTER | DIGIT ) +
;
    INT
:
    DIGIT+
;
fragment
DIGIT
:
    [0-9]
;

fragment
LETTER
:
    [A-Z]
;

WS : [ \t\r\n(\s)+]+ -> channel(HIDDEN) ;
formatter:  '(' information ')';

information : 
information '/' 'A' INT 
        |IDENTIFIER ;

How can I resolve the ambiguity and get the time format matched as 'A' INT not as IDENTIFIER? Also how can I add checks like length of token to the identifier? I tknow that this doesn't work in ANTLR : IDENTIFIER : (DIGIT | LETTER ) {2,10}

UPDATE:

I changed the rules to have semantic checks but I still have the same ambiguity between the identifier and the Time format. here's the modified rules:

formatter
    : information
    | information '-' time
    ;

time :
    timeMode timeCode;  

timeMode:   
    { getCurrentToken().getText().matches("[A,C]")}? MOD
;

timeCode: {getCurrentToken().getText().matches("[0-9]{4}")}?  INT;

information: {getCurrentToken().getText().length() <= 10 }? IDENTIFIER;

MOD:  'A' | 'C';

So the problem is illustrated in the production tree, A0023 is matched to timeMode and the parser is complaining that the timeCode is missing enter image description here


Solution

  • Here is a way to handle it:

    grammar Test;
    
    @lexer::members {
      private boolean isAhead(int maxAmountOfCharacters, String pattern) {
        final Interval ahead = new Interval(this._tokenStartCharIndex, this._tokenStartCharIndex + maxAmountOfCharacters - 1);
        return this._input.getText(ahead).matches(pattern);
      }
    }
    
    parse
     : formatter EOF
     ;
    
    formatter
     : information ( '-' time )?
     ;
    
    time
     : timeMode timeCode
     ;
    
    timeMode
     : TIME_MODE
     ;
    
    timeCode
     : {getCurrentToken().getType() == IDENTIFIER_OR_INTEGER && getCurrentToken().getText().matches("\\d{4}")}?
       IDENTIFIER_OR_INTEGER
     ;
    
    information
     : {getCurrentToken().getType() == IDENTIFIER_OR_INTEGER && getCurrentToken().getText().matches("\\w*[a-zA-Z]\\w*")}?
       IDENTIFIER_OR_INTEGER
     ;
    
    IDENTIFIER_OR_INTEGER
     : {!isAhead(6, "[AP]\\d{4}(\\D|$)")}? [a-zA-Z0-9]+
     ;
    
    TIME_MODE
     : [AP]
     ;
    
    SPACES
     : [ \t\r\n] -> skip
     ;
    

    A small test class:

    public class Main {
    
        private static void indent(String lispTree) {
    
            int indentation = -1;
    
            for (final char c : lispTree.toCharArray()) {
                if (c == '(') {
                    indentation++;
                    for (int i = 0; i < indentation; i++) {
                        System.out.print(i == 0 ? "\n  " : "  ");
                    }
                }
                else if (c == ')') {
                    indentation--;
                }
                System.out.print(c);
            }
        }
    
        public static void main(String[] args) throws Exception {
            TestLexer lexer = new TestLexer(new ANTLRInputStream("1P23 - A0023"));
            TestParser parser = new TestParser(new CommonTokenStream(lexer));
            indent(parser.parse().toStringTree(parser));
        }
    }
    

    will print:

    (parse 
      (formatter 
        (information 1P23) - 
        (time 
          (timeMode A) 
          (timeCode 0023))) <EOF>)
    

    for the input "1P23 - A0023".

    EDIT

    ANTLR also can output the parse tree on UI component. If you do this instead:

    public class Main {
    
        public static void main(String[] args) throws Exception {
            TestLexer lexer = new TestLexer(new ANTLRInputStream("1P23 - A0023"));
            TestParser parser = new TestParser(new CommonTokenStream(lexer));
            new TreeViewer(Arrays.asList(TestParser.ruleNames), parser.parse()).open();
        }
    }
    

    the following dialog will appear:

    enter image description here

    Tested with ANTLR version 4.5.2-1