I am creating my own grammar, and so far I had only primitive types. However, now I would like to add a new type by reference, arrays, with a format similar to Java or C#, but I run into the problem that I am not able to make it work with ANTLR.
The code example I'm working with would be similar to this:
VariableDefinition
{
id1: string;
anotherId: bool;
arrayVariable: string[5];
anotherArray: bool[6];
}
MyMethod()
{
temp: string[3];
temp2: string;
temp2 = "Some text";
temp[0] = temp2;
temp2 = temp[0];
}
The Lexer contains:
BOOL: 'bool';
STRING: 'string';
fragment DIGIT: [0-9];
fragment LETTER: [[a-zA-Z\u0080-\u00FF_];
fragment ESCAPE : '\\"' | '\\\\' ; // Escape 2-char sequences: \" and \\
LITERAL_INT: DIGIT+;
LITERAL_STRING: '"' (ESCAPE|.)*? '"' ;
OPEN_BRACKET: '[';
CLOSE_BRACKET: ']';
COLON: ':';
SEMICOLON: ';';
ID: LETTER (LETTER|DIGIT)*;
And my Parser would be an extension of this (there are more rules and other expressions but I don't think that there is a relation with this scenario):
global_
: GLOBAL '{' globalVariables+=variableDefinition* '}'
;
variableDefinition
: name=ID ':' type=type_ ';'
;
type_
: referenceType # TypeReference
| primitiveType # TypePrimitive
;
primitiveType
: BOOL # TypeBool
| CHAR # TypeChar
| DOUBLE # TypeDouble
| INT # TypeInteger
| STRING # TypeString
;
referenceType
: primitiveType '[' LITERAL_INT ']' # TypeArray
;
expression_
: identifier=expression_ '[' position=expression_ ']' # AccessArrayExpression
| left=expression_ operator=( '*' | '/' | '%') right=expression_ # ArithmeticExpression
| left=expression_ operator=( '+' | '-' ) right=expression_ # ArithmeticExpression
| value=ID # LiteralID
I've tried:
The following sets of rules are mutually left-recursive [type_, arrayType]
type_
: BOOL # TypeBool
| CHAR # TypeChar
| DOUBLE # TypeDouble
| INT # TypeInteger
| STRING # TypeString
| type_ '[' LITERAL_INT ']' # TypeArray
;
temp: string [5] ;
).line 23:25 missing ';' at '[5'
line 23:27 mismatched input ']' expecting {'[', ';'}
· Without whitespace (temp: string[5];
).
line 23:18 mismatched input 'string[5' expecting {BOOL, 'char', 'double', INT, 'string'}
line 23:26 mismatched input ']' expecting ':'
EDIT 1: This is how the tree would look like when trying to generate the example I gave: Parse tree Inspector
fragment LETTER: [[a-zA-Z\u0080-\u00FF_];
You're allowing [
as a letter (and thus as a character in identifiers), so in string[5]
, string[5
is interpreted as an identifier, which makes the parser think the subsequent ]
has no matching [
. Similarly in string [5]
, [5
is interpreted as an identifier, which makes the parser see two consecutive identifiers, which is also not allowed.
To fix this you should remove the [
from LETTER
.
As a general tip, when getting parse errors that you don't understand, you should try to look at which tokens are being generated and whether they match what you expect.