Search code examples
ccompiler-constructionabstract-syntax-treebytecodeinterpreter

Storing values in bytecode


I am in the process of writing an interpreter for a language I am creating in C. Currently it can lex the source code into tokens, and then parse these tokens into an AST. After doing some reading, I have come to the conclusion that using bytecode is faster than just walking through the AST, due to the amount of recursion required in traversing the tree.

So given an AST, how do I go about transforming this into bytecode? More specifically, where are functions, variables and constants actually stored? Are they stored in the bytecode itself, or is there a separate area of memory dedicated to storing these?

Simplified view of how my AST is implemented:

typedef enum {
    AST_NODE_INT,
    AST_NODE_FLOAT,
    AST_NODE_ADD,
    // many, many more of these
};

typedef struct _ast_node {
    ast_node_type type;

    union {
        int as_int;
        float as_float;

        struct as_add {
            struct _ast_node *left;
            struct _ast_node *right;
        };

        //more structs, representing the different types in the enum
    };
};

My program currently takes some source code such as

1 + 2

and generate an AST (this isn't C, just a representation)

{
    type: AST_NODE_ADD,
    as_add: {
        left: {
            type: AST_NODE_INT,
            as_int: 1
        },
        right: {
            type: AST_NODE_INT,
            as_int: 2
        }
    }
}

Solution

  • functions variables and constants (at least their names and whatever lookup information is needed to turn that into a value) are usually stored in a symbol table.

    In the case of a function call you would push however many args onto an evaluation stack and then have a call instruction that invokes the function's byte code.

    I recommend getting Compilers: Principles, Techniques, and Tools (2nd Edition): Alfred V Aho (aka 'the dragon book'), even though it is named "compilers" the material is directly applicable.