Search code examples
antlrantlr3xtext

How to rewrite this grammar so it's not ambiguous anymore


I am currently working on a grammar which should allow me to define localand global Arrays or Variables.
The local ones start with an underscore and that's the only differnece in the names. There are no special keywords to define whether it's local or global and there are no keywords to indicate if the declaration is an Array or a variable.
A variable can be some normal types and a reference to another variable (local or global) and a Array can either be delared with the standard curly brackets or as a refernce to an existing array.

The problem is that Xtext can't seperate if a "name=reference" is a variable or an array.
This is my existing grammar:

grammar org.declarations.dec.Dec with org.eclipse.xtext.common.Terminals
import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate dec "http://www.declarations.org/dec/Dec"

Model:
delarations+=(Declaration)*
;

Declaration:
Variable ";" | Array ";"
;

Variable:
    LocalVar
    | GlobalVar
;

    LocalVar:
        name=LOCALNAME "=" variableContent=VarContent
    ;

    GlobalVar:
        name=GLOBALNAME "=" variableContent=VarContent
    ;

        VarContent:
            stringContent=STRING
            | IntContent=INT
            | localRef=[LocalVar|LOCALNAME]
            | globalRef=[GlobalVar|GLOBALNAME]
        ;

Array:
    LocalArray
    |GlobalArray
;

    LocalArray:
        name=LOCALNAME "=" content=ArrayLiteral
    ;

    GlobalArray:
        name=GLOBALNAME "=" content=ArrayLiteral
    ;

        ArrayLiteral:
            "[" (c1=ArrayContent ("," c2+=ArrayContent)*)? "]"
            | localRef=[LocalArray|LOCALNAME]
            | globalRef=[GlobalArray|GLOBALNAME]
        ;

            ArrayContent:
                varContent=VarContent
                | localRef=[LocalArray|LOCALNAME]
                | globalRef=[GlobalArray|GLOBALNAME]
            ;



terminal LOCALNAME:
    "_" ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
terminal GLOBALNAME:
    ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;

The code I want to recognise is for example:

_localVar1 = "Test";  
globalVar1 = _localVar1;  

globalArray = ["hello",globalVar1];  
nextArray = globalArray;  
anotherArray = [globalArray, nextArray];  

Does anyone has an idea how to overcome this problem?

Greetings Krzmbrzl


Solution

  • As my question was only answered via comments I will write an answer so the question won't be in the unanswered section. I found the solution thanks to Christian Dietrich:

    The solution was to let the parser ignore whether the declaration is an variable or an array and also ignore whether the declaration is local or global. The resulting grammar looks as following:

    Model:
    elements += Code*
    ;
    
    Code:
    dec=Declaration ";"
    ;
    
    Declaration:
        name = ID "=" decCon=DecContent
    ;
    
        DecContent:
            singleContent=VarContent (op+=OPERATOR nextCon+=VarContent)*
        ;
    
            VarContent:
                num = NUMBER
                | string = STRING
                | reference = [Declaration]
                | "+"? arrayContent=ArrayLiteral
            ;
    
                ArrayLiteral:
                    con = "[" (content = VarContent ("," nextContent += VarContent)*)? "]"
                ;
    

    Figuring out whether the declaration is a variable, an array, local or gloabal is task of the validator.
    The rule here is: Do as less as possible with the parser and as much as possible with the validator.

    Hope that this can help you out when you have a similar problem.

    Greetings Krzmbrzl