Search code examples
c#compilationabstract-syntax-treesymbol-tablecompiler-construction

Does a Symbol Table store AST (Declaration)Nodes or are the "Symbols" different objects/classes?


I have a few things about the AST / Symbol Table relation that i don't understand.

I currently have a AST implemented in C# which has nodes for variable declarations (these contain informations about the name, type, source position, a possible constant value as expression node, etc).

Now i want to fill a symbol table (using the visitor pattern on my AST) but my question is: are the "symbols" new classes for example VariableSymbol or does the symbol table directly store the VariableDeclarationNode from the AST?

IF the symbols are new classes, then who would store the evaluated expression value for constant variables, the VariableDeclarationNode or the VariableSymbol or somewhere else?

(I have seen some interpreter examples and they store all variable values, including constants, in a additional hash table, but im working on a source-to-source compiler and not a interpreter, so im not sure where you store the evaluated constants in this case. Sorry i know these a kinda multiple questions)


Solution

  • are the "symbols" new classes for example VariableSymbol or does the symbol table directly store the VariableDeclarationNode from the AST?

    If the information in the AST node is sufficient for the task then you're good with just storing references in scope tree/table. if you interpret from syntax tree instead of just emitting code, then you need sophisticated data structures where the existence of a reference to original AST nodes is a secondary concern. We've seen and done both and both work. Not keeping references to "primitive" AST nodes at stages beyond lexing & parsing is a cleaner approach.

    [would it] be dirty to store the evaluated constant values (for the special case) in the symbol classes or should i create an additonal table for these?

    That really depends, too... If you envision the constant value as an inherent property of the declaration, store it in your symbol descriptor class:

    class Symbol : ISymbol {
        ASTNode DeclaringNode;
        SymbolType RuntimeType;
        bool InitializeAsConstant;
        RuntimeValue ConstantValue;
    
        ...
    }
    

    If you keep the comprising rvalues, so you could replicate the declaration verbatim in the target language, then treat them like a variable until the end of the process:

    /* fantasy source language */
    Constant $$IAMCONSTANT :=> /03\ MUL /02\ KTHXBYE
    
    /* target language */
    const int IAMCONSTANT = 3 * 2;
    
    /* as opposed to compilation stage 1 precomputed */
    const int IAMCONSTANT = 6;
    

    The first is easier for the source-to-source case because you may get away without computing values of expressions in the compiler.