Search code examples
c#grammarantlr4lexer

Using ANTLR Parser and Lexer Separatly


I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine.

CompilerLexer.g4:


lexer grammar CompilerLexer;

INT         :   'int'   ;   //1
FLOAT       :   'float' ;   //2
BEGIN       :   'begin' ;   //3
END         :   'end'   ;   //4
To          :   'to'    ;   //5
NEXT        :   'next'  ;   //6
REAL        :   'real'  ;   //7
BOOLEAN     :   'bool'  ;   //8
.
.
.
NOTEQUAL    :   '!='    ;   //46
AND         :   '&&'    ;   //47
OR          :   '||'    ;   //48
POW         :   '^'     ;   //49
ID          : [a-zA-Z]+ ;   //50




WS
:   ' ' -> channel(HIDDEN)  //50
;

Now it is time for phase 2 which is the parser.I created "CompilerParser.g4" file and putted grammars in it but have dozens warning and errors.

CompilerParser.g4:


parser grammar CompilerParser;

options {   tokenVocab = CompilerLexer; }

STATEMENT   :   EXPRESSION SEMIC
        |   IFSTMT
        |   WHILESTMT
        |   FORSTMT
        |   READSTMT SEMIC
        |   WRITESTMT SEMIC
        |   VARDEF SEMIC
        |   BLOCK
        ;

BLOCK       : BEGIN STATEMENTS END
        ;

STATEMENTS  : STATEMENT STATEMENTS*
        ;

EXPRESSION  : ID ASSIGN EXPRESSION
        | BOOLEXP
        ;

RELEXP      : MODEXP (GT | LT | EQUAL | NOTEQUAL | LE | GE | AND | OR) RELEXP
        | MODEXP
        ;

.
.
.

VARDEF      : (ID COMA)* ID COLON VARTYPE
        ;

VARTYPE     : INT
        | FLOAT
        | CHAR
        | STRING
        ;
compileUnit
:   EOF
;

Warning and errors:

  • implicit definition of token 'BLOCK' in parser
  • implicit definition of token 'BOOLEXP' in parser
  • implicit definition of token 'EXP' in parser
  • implicit definition of token 'EXPLIST' in parser
  • lexer rule 'BLOCK' not allowed in parser
  • lexer rule 'EXP' not allowed in parser
  • lexer rule 'EXPLIST' not allowed in parser
  • lexer rule 'EXPRESSION' not allowed in parser

Have dozens of these warning and errors. What is the cause?

General Questions: What is difference between using combined grammar and using lexer and parser separately? How should join separate grammar and lexer files?


Solution

  • Lexer rules start with a capital letter, and parser rules start with a lowercase letter. In a parser grammar, you can't define tokens. And since ANTLR thinks all your upper-cased rules lexer rules, it produces theses errors/warning.

    EDIT

    user2998131 wrote:

    General Questions: What is difference between using combined grammar and using lexer and parser separately?

    Separating the lexer and parser rules will keeps things organized. Also, when creating separate lexer and parser grammars, you can't (accidentally) put literal tokens inside your parser grammar but will need to define all tokens in your lexer grammar. This will make it apparent which lexer rules get matched before others, and you can't make any typo's inside recurring literal tokens:

    grammar P;
    
    r1 : 'foo' r2;
    
    r2 : r3 'foo '; // added an accidental space after 'foo'
    

    But when you have a parser grammar, you can't make that mistake. You will have to use the lexer rule that matches 'foo':

    parser grammar P
    
    options { tokenVocab=L; }
    
    r1 : FOO r2;
    
    r2 : r3 FOO;
    
    
    lexer grammar L;
    
    FOO : 'foo';
    

    user2998131 wrote:

    How should join separate grammar and lexer files?

    Just like you do in your parser grammar: you point to the proper tokenVocab inside the options { ... } block.

    Note that you can also import grammars, which is something different: https://github.com/antlr/antlr4/blob/master/doc/grammars.md#grammar-imports