Search code examples
compiler-constructionantlr4lexer

How to create token in antlr4 lexer g4 file for IF () THEN abc ELSEIF () THEN xyz ElSE yzx ENDIF


Issue is i have written the lexer file and created the token like

IF_EXPR : 'IF';
ELSEIF_EXPR : 'ELSEIF';
THEN_EXPR : 'THEN';
ELSE_EXPR : 'ELSE'

But there are some cases where i can have some string that contains 'IF' in the condition block or in side the THEN block. and it is considering the 'IF' in the string as as token.

Example:

IF abc=1 
THEN 
   xyzIF=3
ELSE 
   abc=2
ENDIF

In the above example my lexer is treating the 'IF' in xyzIF as a IF_EXPR token but it should consider xyzIF as one different token.


Solution

  • You should have an IDENTIFIER rule define that would match xyzIF:

    // Keywords first!
    IF : 'IF';
    
    // After keywords, define something that matches an identifier:
    IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;