I'm trying to write grammar to some kind of assembler language but not using mnemonics.
I have some kind of registers, lets say:
a, b, c, d
And one special register, which keeps address in memory:
&e
Now I want to allow to assign values to them:
a = b
d = a
c = &e
a
is also a special register (accumulator), so it can has some operations made only on it like:
a = a xor d
all of them has a
on the left side and one of the all registers on the right side.
I
My grammar:
grammar somename;
options {
language = CSharp;
}
program: line* EOF;
line: statement (NEWLINE+ | EOF);
statement: aOperation | registerAssignment;
expression:
or #orAssignment
| xor #xorAssignment;
xor:
XOR reg8;
reg: hl_read | REGISTER8;
aOperation: REG_A '=' REG_A expression;
registerAssignment: reg '=' reg;
REGISTER:
REG_A
| 'b'
| 'c'
| 'd';
e_read: E_READ;
REG_A: 'a';
OR: 'or';
XOR: 'xor';
E_READ: '&e';
WHITESPACE: (' ' | '\t')+ -> skip;
NEWLINE: ('\r'? '\n' | '\r');
Now I've got a problem, that parser always catch a line a = a xor b
as a = b and next round of parser get b register and there is nothing on the right side and throws error An unhandled exception of type 'System.IndexOutOfRangeException' occurred in Program.dll: 'Index was outside the bounds of the array.'
How can I fix this?
As mentioned in the comments by sepp2k: the lexer will never produce a REG_A
token because the input 'a'
would already be consumed by the REGISTER
rule.
A solution would be to remove the REGISTER
lexer rule and create a register
parser rule:
register
: REG_A
| REG_B
| REG_C
| REG_D
;
REG_A: 'a';
REG_B: 'b';
REG_C: 'c';
REG_D: 'd';