Search code examples
compiler-constructioncode-generationyaccsymbol-tablestack-machine

How to store variables from Symbol Table Compilers


For my class, I have to write a compiler for a tiny subset of Python:

  • This language has one method
  • There aren't functions, so I'm only dealing with one lexical scope

This Python subset will be translated to Java bytecode. I've already done lexical analysis, and the parsing tree (using lex and yacc). I'm stuck with on the code generation.

We are using Gnoloo for the code generation, a stack machine language.

The problem is that I don't know how to store the variables. I know I have to use a symbol table, but I don't know how to fill it.

Do I have to store the value of the variables?

If the code has x = 2, will the symtable has to have a field for it?

How can I store the variables for the stack machines.


Solution

  • You haven't said what language you're using, C++ or C.

    C++:

    Managing variables in C++ is rather easy, you will basically have one map std::map<string,int> symbol_table; (assuming your variables are integers). First time when you use variable you would insert it into map, and each time you have declaration you would update value in map. This works really fast in C++. Of course, you would add/update these values in Yacc parser.

    C:

    In C situation is little tricky, there are no maps! So what you need to do is create Binary Search Tree. Node in that tree will contain char array - representing variable name, and there will also be some value. When you get some variable for the first time you need to add it into BST, when you change value you have to first find it and then update value in that node.

    Note: In C there is problem with memory allocation, that problem is name memory allocation, you usually do that in Lex (luckily there is strdup function for that).`

    I don't think code example is necessary for C++, however I will give you example in C.

    Yacc beginning:

    %{
        #include <stdio.h>
        #include <stdlib.h>
        #include "tree.h" /* All methods I mentioned above must be implemented */
    
        node *map = NULL; /* You would insert every variable here */
    
    %}
    

    Union:

    %union {
    
        char *s; /* We will allocate memory in Lex */
        int value; /* We will update this value in Yacc */
    };
    

    Lex:

    [a-zA-Z_][a-zA-Z0-9_]* { 
    
        yylval.s = strdup(yytext);
        if(yylval.s == NULL){
    
            fprintf(stderr,"Unable to allocate memory for variable name.\n");
            exit(EXIT_FAILURE);
        }
    
        return id_token; 
    }
    

    That is basically it. There is work to be done in order to make this work. If you have any further questions, feel free to ask.