Search code examples
antlr4xtext

Simple Xtext example generates grammar that Antlr4 doesn't like - who's to blame?


While using XText, I have come across a problem and I am not sure if Antlr4 or XText is at fault or if I'm just missing something. I understand that Antlr4 is not supported by Xtext, but it seems like this particular case should not cause a problem.

Here is a simple Xtext file:

grammar com.github.jsculley.antlr4.Test with org.eclipse.xtext.common.Terminals
generate test "http://www.github.com/jsculley/antlr4/test"
aRule:
    name=STRING
;

STRING is defined in the XText rule from org.eclipse.xtext.common.Terminals:

terminal STRING : 
            '"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"' |
            "'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'"
        ; 

The generated Antlr grammar has the following rule:

RULE_STRING : ('"' ('\\' .|~(('\\'|'"')))* '"'|'\'' ('\\' .|~(('\\'|'\'')))* '\'');

The Antlr 3.5.2 tool has no problem with this rule, but the Antlr4 tool spits out the following errors:

error(50): InternalTest.g:102:29: syntax error: '(' came as a complete surprise to me while looking for lexer rule element
error(50): InternalTest.g:102:62: syntax error: '(' came as a complete surprise to me while looking for lexer rule element
error(50): InternalTest.g:102:74: syntax error: mismatched input ')' expecting SEMI while matching a lexer rule
error(50): InternalTest.g:106:25: syntax error: '(' came as a complete surprise to me while looking for lexer rule element
error(50): InternalTest.g:106:36: syntax error: mismatched input ')' expecting SEMI while matching a lexer rule

Antlr4 doesn't like the extra (and seemingly uneccessary) sets of parentheses around the group after each '~' operator. So the question is, is Xtext generating a bad grammar, or is Antlr4 not handling a valid construct?


Solution

  • It seems that ANTLR 4 does not handle parenthesis correctly: Parser issues mutual left recursion error when the left-recursive part of a rule is in parenthesis.

    So, just remove useless parenthesis and ANTLR 4 should generate a fully ANLTR 3 compatible parser. I ported PL/SQL grammar from ANTLR 3 -> ANTLR 4. Moreover, ANLTR 4 have a more powerfull parsing algorithm compare to the previous version.