Search code examples
arraysantlr4grammar

ANTLR4 - What is the correct way to define an array type?


I am creating my own grammar, and so far I had only primitive types. However, now I would like to add a new type by reference, arrays, with a format similar to Java or C#, but I run into the problem that I am not able to make it work with ANTLR.

The code example I'm working with would be similar to this:

VariableDefinition
{
    id1: string;
    anotherId: bool;
    arrayVariable: string[5];
    anotherArray: bool[6];
}

MyMethod()
{
    temp: string[3];
    temp2: string;
    temp2 = "Some text";
    temp[0] = temp2;
    temp2 = temp[0];
}

The Lexer contains:

BOOL:                   'bool';
STRING:                 'string';

fragment DIGIT:         [0-9];
fragment LETTER:        [[a-zA-Z\u0080-\u00FF_];
fragment ESCAPE :          '\\"' | '\\\\' ; // Escape 2-char sequences: \" and \\
LITERAL_INT:            DIGIT+;
LITERAL_STRING:         '"' (ESCAPE|.)*? '"' ;

OPEN_BRACKET:           '[';
CLOSE_BRACKET:          ']';
COLON:                  ':';
SEMICOLON:              ';';

ID:                     LETTER (LETTER|DIGIT)*;

And my Parser would be an extension of this (there are more rules and other expressions but I don't think that there is a relation with this scenario):


global_
    : GLOBAL '{' globalVariables+=variableDefinition* '}'
    ;

variableDefinition
    : name=ID ':' type=type_ ';'                                               
    ;

type_
    : referenceType                     # TypeReference
    | primitiveType                     # TypePrimitive
    ;

primitiveType
    : BOOL                              # TypeBool
    | CHAR                              # TypeChar
    | DOUBLE                            # TypeDouble
    | INT                               # TypeInteger
    | STRING                            # TypeString
    ;

referenceType
    : primitiveType '[' LITERAL_INT ']' # TypeArray
    ;

expression_
    : identifier=expression_ '[' position=expression_ ']'      # AccessArrayExpression
    | left=expression_ operator=( '*' | '/' | '%') right=expression_      # ArithmeticExpression
    | left=expression_ operator=( '+' | '-' ) right=expression_      # ArithmeticExpression
    | value=ID                              # LiteralID

I've tried:

  • Put spaces between the different lexemes in the example programme in case there was a problem with the lexer. (nothing changed).
  • Creating one rule in type_ called arrayType, and in arrayType reference type_ (fails due to a left-recursion: ANTLR shows the following error The following sets of rules are mutually left-recursive [type_, arrayType]
  • Put primitive and reference types into a single rule.
type_
    : BOOL                              # TypeBool
    | CHAR                              # TypeChar
    | DOUBLE                            # TypeDouble
    | INT                               # TypeInteger
    | STRING                            # TypeString
    | type_ '[' LITERAL_INT ']'         # TypeArray
    ;
  • Results: · With whitespace separating the array (temp: string [5] ;).
line 23:25 missing ';' at '[5'
line 23:27 mismatched input ']' expecting {'[', ';'}

· Without whitespace (temp: string[5];).

line 23:18 mismatched input 'string[5' expecting {BOOL, 'char', 'double', INT, 'string'}
line 23:26 mismatched input ']' expecting ':'

EDIT 1: This is how the tree would look like when trying to generate the example I gave: Parse tree Inspector


Solution

  • fragment LETTER:        [[a-zA-Z\u0080-\u00FF_];
    

    You're allowing [ as a letter (and thus as a character in identifiers), so in string[5], string[5 is interpreted as an identifier, which makes the parser think the subsequent ] has no matching [. Similarly in string [5], [5 is interpreted as an identifier, which makes the parser see two consecutive identifiers, which is also not allowed.

    To fix this you should remove the [ from LETTER.

    As a general tip, when getting parse errors that you don't understand, you should try to look at which tokens are being generated and whether they match what you expect.