Search code examples
nestedcharantlrgrammarantlrworks

Nested brackets/chars '(' and ')' in grammar/ANTLRWorks warning: Decision can match input such as ... using multiple alternatives


The grammar below parses ( left part = right part # comment ), # comment is optional.

Two questions:

  1. Sometimes warning (ANTLRWorks 1.4.2): Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2 (referencing id2)
    But only sometimes!
  2. The next extension should be that the comment (id2) can contain chars '(' and ')'.

The grammar:

grammar NestedBrackets1a1;

//==========================================================
// Lexer Rules
//==========================================================

Int
  :  Digit+
  ;

fragment Digit
  :  '0'..'9'
  ;

Special
  : ( TCUnderscore | TCQuote )
  ;
TCListStart   : '(' ; 
TCListEnd     : ')' ;   
fragment TCUnderscore  : '_' ;
fragment TCQuote       : '"' ;

// A word must start with a letter
Word
  :  ( 'a'..'z' | 'A'..'Z' | Special ) ('a'..'z' | 'A'..'Z' | Special | Digit )*
  ;

Space
  :  ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
  ;

//==========================================================
// Parser Rules
//==========================================================

assignment
  :  TCListStart id1 '=' id1 ( comment )? TCListEnd
  ;

id1
  :  Word+
  ;

comment
  : '#' ( id2 )*
  ;

id2
  :  ( Word | Int )+
  ;

Solution

  • ANTLRStarter wrote:

    Sometimes warning (ANTLRWorks 1.4.2): Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2 (referencing id2) But only sometimes!

    No, the grammar you posted will always produce this warning. Perhaps you don't always notice it (your IDE-plugin or ANTLRWorks might show it in a tab you don't have opened), but the warning is there. Convince yourself by creating a lexer/parser from the command line:

    java -cp antlr-3.4-complete.jar org.antlr.Tool NestedBrackets1a1.g
    

    will produce:

    warning(200): NestedBrackets1a1.g:49:19:
    Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2
    
    As a result, alternative(s) 2 were disabled for that input
    

    This is because you have a * after ( id2 ) inside your comment rule, and id2 also is a repetition of tokens: ( Word | Int )+. Let's say your input is "# foo bar" (a # followed by two Word tokens). ANTLR can now parse the input in more than 1 way: the 2 tokens "foo" and "bar" could be matched by ( id2 )*, where id2 matches a single Word token at a time, but "foo" and "bar" could also be matches in one go of the id2 rule.

    Look at the merged rules:

    comment
      :  '#' ( ( Word | Int )+ )*
      ;
    

    See how you're repeating a repetition: ( ( ... )+ )*? This is usually a problem, as it is in your case.

    Resolve this problem by either replacing the * with a ?:

    comment
      :  '#' ( id2 )?
      ;
    
    id2
      :  ( Word | Int )+
      ;
    

    or by removing the +:

    comment
      :  '#' ( id2 )*
      ;
    
    id2
      :  ( Word | Int )
      ;
    

    ANTLRStarter wrote:

    The next extension should be that the comment (id2) can contain chars '(' and ')'.

    That is asking for trouble since a comment is followed by a TCListEnd, which is a ). I don't recommend letting a comment match ).

    EDIT

    Note that comments are usually stripped from the source file while tokenizing the input source. That way you don't need to account for them in your parser rules. You can do that by "skipping" these tokens in a lexer rule:

    Comment
      :  '#' ~('\r' | '\n')* {skip();}
      ;