Search code examples
compiler-constructionabstract-syntax-tree

Compiler AST how to implement statements and expressions


Currently working on a little toy compiler, consider this Code:

// AST base class
abstract class AST { /* codegen methods */}

// abstract classes for Statements and Expressions
abstract class Statement : AST {}
abstract class Expression : AST {}

// usage of the abstract classes
class CodeBlock : AST {
    public List<Statement> BlockStatements;
}
class BinOp : AST {
    public Expression LHS, RHS;
    public char Operator;
}

// a constant value is always an expression
class ConstantInt : Expression {
    public int Value;
}

Now to the problem how would I implement the FunctionCall class? If it is used in an expression it would be part of the expression like min(4, 5) + 3 therefor FunctionCall : Expression makes sense. But then I can't have function calls in a block like this { writeToConsole("Hello World"); } so the FunctionCall : Statement sounds reasonable but this would not work with expression syntax. Making Statement inherit from Expression would not work either since it would allow AST's like this min(4, 5) + int a.

I'd like to get suggestions on how to keep statements and expressions seperate except for things that can be both.


Solution

  • Making it so that statements are expressions is indeed not a good idea (for the reason you point out). However making it so that expressions are statements has a lot more merit.

    In fact most¹ languages' grammars have an entry similar to this:

    statement ::= expression ';'
    

    That is an expression followed by a semicolon is a statement, a so-called expression statement. In your AST you could represent this either by making Expression inherit Statement or by creating a class ExpressionStatement, which simply wraps Expression.

    Besides allowing function calls as statements, it also allows other expressions as statements. For side-effecting expressions like assignments, composite assignments or increment expressions this makes a lot of sense.

    For expressions without side effects, like simple arithmetic, it makes less sense. In C and C++ a statement like a + b; is in fact legal, but causes a warning in most compilers. In other languages there is an explicit rule that prohibits certain types of expressions to be used as statement expressions.


    ¹ Only taking into account those languages that have a distinction between statements and expressions at all, of course.