Search code examples
antlrantlrworks

ANTLR rule works on its own, but fails when included in another rule


I am trying to write an ANTLR grammar for a reparsed and retagged kconfig file (retagged to solve a couple of ambiguities). A simplified version of the grammar is:

grammar FailureExample;

options {
language = Java;
}


@lexer::header {
package parse.failure.example;
}

reload 
:   configStatement*
EOF
;

configStatement
: CONFIG IDENT
configOptions
;

configOptions
:   (type 
| defConfigStatement
| dependsOnStatement
| helpStatement
| rangeStatement
| defaultStatement
| selectStatement
| visibleIfStatement
| prompt
)*
;

type :  FAKE1;
dependsOnStatement: FAKE2;
helpStatement:  FAKE3;
rangeStatement: FAKE4;
defaultStatement:   FAKE5;
selectStatement:FAKE6;
visibleIfStatement:FAKE7;
prompt:FAKE8;   

defConfigStatement
: defConfigType expression
;

defConfigType
: DEF_BOOL
;

//expression parsing
primative
: IDENT
| L_PAREN expression R_PAREN
;

negationExpression
: NOT* primative
;

orExpression
: negationExpression (OR negationExpression)*
;

andExpression
: orExpression (AND orExpression)*
;

unequalExpression
: andExpression (NOT_EQUAL andExpression)?
;

equalExpression
: unequalExpression (EQUAL unequalExpression)?
;

expression
: equalExpression (BECOMES equalExpression)?
;

DEF_BOOL:    'def_bool';
CONFIG : 'config';  
COMMENT     : '#' .* ('\n'|'\r') {$channel = HIDDEN;};
AND         : '&&';
OR      : '||';
NOT         : '!';
L_PAREN     : '(';
R_PAREN     : ')';
BECOMES     : '::=';
EQUAL       :  '=';
NOT_EQUAL   : '!=';

FAKE1 : 'fake1';
FAKE2:   'fake2';
FAKE3:   'fake3';
FAKE4:   'fake4';
FAKE5:   'fake5';
FAKE6:   'fake6';
FAKE7:   'fake7';
FAKE8:   'fake8';

IDENT       : (LETTER | DIGIT | '_')*;
WS  :   ( ' '
    | '\t'
    | '\r'
    | '\n'
    ) {$channel=HIDDEN;}
;

fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';

With input:

config HAVE_DEBUG_RAM_SETUP
def_bool n

I can set antlrworks to parse just the second line (commenting out the first) and I get the proper defConfigStatement token emitted with the proper expression following. However, if I exercise either the configOptions rule or the configStatement rule (with the first line uncommented), my configOptions results in an empty set and a NoViableAlt exception is thrown.

What would cause this behavior? I know that the defConfigStatement rule is accurate and can parse correctly, but as soon as it's added as a potential option in another rule, it fails. I know I don't have conflicting rules, and I've made DEF_BOOL and DEF_TRISTATE rules the top in my list of lexer rules, so they have priority over the other lexer rules.

/Added since edit/ To further complicate the issue, if I move the defConfigStatement choice in the configOptions rule, it will work, but other rules will fail.

Edit: Using full, simplified grammar.

In short, why does the rule work on its own, but fail when it's in configOptions (especially since configOptions is in (A | B | C)* form)?


Solution

  • When I parse the input:

    config HAVE_DEBUG_RAM_SETUP
    def_bool n
    

    with the parser generated from your grammar, I get the following parse tree:

    enter image description here

    So, I see no issues here. My guess is that you're using ANTLRWorks' interpreter: don't. It's buggy. Always test your grammar with a class of your own, or use ANTLWorks' debugger (press CTRL+D to launch is). The debugger works like a charm (without the package declaration, btw). The image I posted above is an export from the debugger.

    EDIT

    If the debugger doesn't work, try (temporarily) removing the package declaration (note that you're only declaring a package for the lexer, not the parser, but that might be a caused by posting a minimal grammar). You could also try changing the port number the debugger should connect to. It could be the port is already in use (see: File -> Preferences -> Debugger-tab).