Search code examples
parsingantlrgrammar

How to Restrict return Statements to Function Declarations in ANTLR Grammar?


Question: I'm working on a custom parser using ANTLR to define a small programming language. One of the requirements is that return statements can only appear inside the body of a function. If a return statement appears outside a function, the parser should throw an error.

Here's the simplified grammar I'm working with (in ANTLR):

grammar Grammar;

options {
    language=Python3;
}

// Parser Rules
program: (var_decl | fun_decl)*;

fun_decl: type_spec ID '(' param_decl* (';' param_decl)* ')' body; // Function declarations
param_decl: type_spec ID (',' ID)* ; // Parameters for functions
type_spec: 'int' | 'float' ; // Valid types

body: '{' stmt* '}'; 
expr: 'expr';

stmt: assignment | call | r_return | var_decl;
var_decl: param_decl ';'; // Variable declarations
assignment: ID '=' expr ';';
call: ID '(' expr* (',' expr)* ')' ';';
r_return: 'return' expr ';';

// Lexer Rules
WS: [ \t\r\n] -> skip ; // Skip whitespace
ID: [a-zA-Z]+ ; // Identifiers (variable and function names)
ERROR_CHAR: . {raise ErrorToken(self.text)} ; // Error handling

The issue is that this grammar allows return statements (r_return) to appear anywhere a stmt is allowed, including in the global scope. For example:

int x;
return x; // This should throw an error.

But inside a function, it should work:

int myFunction() {
    return 42; // Valid
}

I thought about it but I did not come up with a solution. Please help me.


Solution

  • Add EOF to the end of your program parser rule...

    program: (var_decl | fun_decl)* EOF;
    

    ...to cause the parser to indicate an error in your first test case.

    Not directly related to your question, I suggest defining lexer rules such as...

    OPEN_PAREN: '(';
    CLOSE_PAREN: ')';
    SEMICOLON: ';';
    COMMA: ',';
    OPEN_CURLY: '{';
    CLOSE_CURLY: '}';
    EQ: '=';
    INT: 'int';
    FLOAT: 'float';
    RETURN: 'return';
    

    ...to use in your parser rules instead of character literals.