Search code examples
c#antlrantlr4antlr4cs

Antlr parser choose incorrect rule before reaching end of the line


I'm trying to write grammar to some kind of assembler language but not using mnemonics.

I have some kind of registers, lets say:

a, b, c, d

And one special register, which keeps address in memory:

&e

Now I want to allow to assign values to them:

a = b
d = a
c = &e

a is also a special register (accumulator), so it can has some operations made only on it like:

a = a xor d

all of them has a on the left side and one of the all registers on the right side. I

My grammar:

grammar somename;
options {
    language = CSharp;
}
program: line* EOF;

line: statement (NEWLINE+ | EOF);

statement: aOperation | registerAssignment;

expression:
    or #orAssignment
    | xor #xorAssignment;


xor:
    XOR reg8;

reg: hl_read | REGISTER8;

aOperation: REG_A '=' REG_A expression;

registerAssignment: reg '=' reg;

REGISTER:
    REG_A
    | 'b'
    | 'c'
    | 'd';

e_read: E_READ;

REG_A: 'a';
OR: 'or';
XOR: 'xor';
E_READ: '&e';
WHITESPACE: (' ' | '\t')+ -> skip;
NEWLINE: ('\r'? '\n' | '\r');

Now I've got a problem, that parser always catch a line a = a xor b as a = b and next round of parser get b register and there is nothing on the right side and throws error An unhandled exception of type 'System.IndexOutOfRangeException' occurred in Program.dll: 'Index was outside the bounds of the array.' How can I fix this?


Solution

  • As mentioned in the comments by sepp2k: the lexer will never produce a REG_A token because the input 'a' would already be consumed by the REGISTER rule.

    A solution would be to remove the REGISTER lexer rule and create a register parser rule:

    register
     : REG_A
     | REG_B
     | REG_C
     | REG_D
     ;
    
    REG_A: 'a';
    REG_B: 'b';
    REG_C: 'c';
    REG_D: 'd';