Search code examples
python-3.xantlr4mismatch

ANTLR4 - Lexer can't match a token made of fragments


I'm having trouble with this grammar

prog : start line* end ;
start : SIMBOL START NUM NL ;

SIMBOL : [a-zA-Z]+ ;
NUM : [0-9]+ ;
START : S T A R T ;
WS : [ \t]+ -> skip ;
NL  :   '\r'? '\n' ;

fragment A : [aA] ;
fragment B : [bB] ;
fragment C : [cC] ;
fragment D : [dD] ;
fragment E : [eE] ;
fragment F : [fF] ;
fragment G : [gG] ;
fragment H : [hH] ;
fragment I : [iI] ;
fragment J : [jJ] ;
fragment K : [kK] ;
fragment L : [lL] ;
fragment M : [mM] ;
fragment N : [nN] ;
fragment O : [oO] ;
fragment P : [pP] ;
fragment Q : [qQ] ;
fragment R : [rR] ;
fragment S : [sS] ;
fragment T : [tT] ;
fragment U : [uU] ;
fragment V : [vV] ;
fragment W : [wW] ;
fragment X : [xX] ;
fragment Y : [yY] ;
fragment Z : [zZ] ;

And the string I'm testing is the following

test    start   1010
        add     30
        end     simbol

The 'test' word is matched by the SIMBOL rule, which is correct. Problem is that 'start' is not matched.

By using a ErrorListener i get the following message in the syntaxError method

mismatched input 'start' expecting START

At position 1:8 which is the beginning of the 'start' word.

I'm new to ANTLR and i can't figure out where i am wrong.

BTW, I'm using ANTLR 4.7.1 with it's proper runtime in python 3.


Solution

  • I totally forgot that the lexer rules order matters. I had the SIMBOL rule above the START rule so the 'start' word was being mismatched as a SIMBOL token and not a START token.

    I fixed the problem by moving the SIMBOL rule to the end of the grammar.