Search code examples
c++compiler-constructionsymbol-table

Record ownership in symbol tables


I am implementing a symbol table as described in the dragon book:

class SymbolTable {
    std::unordered_map<std::string, Record> table;
    SymbolTable* parent;

public:
    SymbolTable(SymbolTable* p) : parent{p} {}

    const Record* lookUp(const std::string& name) const {
        for (auto* scope = this; scope != nullptr; scope = scope->parent) {
            auto iter = scope->table.find(name);
            if (iter != cend(scope->table))
                return &iter->second;
        }
        return nullptr;
    }

    bool insert(const std::string& name, const Record& record) { 
        return names.insert({name, record}).second; 
    }
};

However, I am not sure how to store the record data. Who should own the type information? Should Record contain a non-owning pointer to the type already stored in the AST?

Also, I would like to keep my symbol table around for later compiler passes. Cooper & Torczon briefly mention directly inserting pointers to the appropriate SymbolTable in the AST node. Is that the common approach?


Solution

  • The lookup for names in records usually doesn't follow the bottom-up approach implemented using a parent pointer from scope to scope. (In fact, that simple datastructure may not be entirely applicable to scopes either; as soon as you introduce lexical closures, your scope relationships become more complicated.)

    Although there are languages which will do implicit lookup from a structure to the containing structure's members, they're rare and experience shows that this form of name lookup is prone to difficulty, even though it occasionally seems convenient.

    The most common pattern is that a structure type contains a list of members, each with its own type. That list of members is, in effect, a symbol table since in order to parse a member reference like r.a.b.c, you need to search for a in r's members, then b in r.a's members, and so on. That suggests that a structure type contain a symbol table of members (which might or might not be a pointer, depending on your design. Typically member lists of a structure are not shared, but in the case of OO subclass/superclass relationships, member lookup can be more complicated.)

    I guess the point I'm trying to make here is that the structure of your symbol table depends a lot on the nature of your language. At its core, a symbol table contains a list of symbols organized in a way which makes it efficient to lookup a symbol by its name. The symbol table associates each symbol with some symbol data object, which might vary from symbol table type to symbol table type (for example using C++ generics) or might be consistent across all symbol tables. Often, symbol tables differ from simple hash tables (or associative containers) by the fact that the symbols also have some kind of linear ordering, used to produce a linear representation at compile time. Precise details will vary, but being able to iterate over the symbols in a consistent, well-defined order is often an important feature.

    By the general principle of separation of concerns, a symbol table as described above should not also attempt to be a container of symbol tables. The symbol table can answer questions about the names it contains. Searching through multiple symbol tables (scope search, or whatever) is best done with a different object, which knows how to handle name lookup failure in some symbol table but doesn't need to understand the technical details of a single name lookup.

    Whether you can keep persistent pointers or references to a symbol table depends entirely on your low-level design. If that's your wish, it's easily accomplished. I think it is pretty common, but I can't speak for the huge variety of language implementations out there.

    Symbol tables do not always interrelate in simple ways which can easily be expressed as ownership. In that, they are similar to other internal objects floating around in a compiler. An AST Node might suddenly become a shared node in a graph rather than being a tree node, once you start to implement common-sub-expression optimisations. (And that's just one example.) As far as I know, most compilers of any complexity end up implementing some kind of garbage collection for internal objects, unless of course the compiler is written in a language with general garbage collection.