Search code examples
c++bisonintermediate-code

bad_alloc when attempting to print string that was assigned to member of $$ struct


During our compiler's intermediate code generation phase, and more specifically while testing the arithmetic expressions and assignment rules, I noticed that although the respective quads are constructed successfully, when printing them out sometimes we'll get a bad_alloc exception. After tracing it, it looks like it's cause by the printQuads() method and specifically the following string access of key:

if(q.result != nullptr && q.result->sym != nullptr) {
    cout << "quad " << opcodeStrings[q.op] << " inside if key check for" << opcodeStrings[q.op] << endl;
    resultKey = q.result->sym->key;
}

I'll try to include the code that's relevant instead of dumping 500 lines of code here. So, below you can see our assignmentexpr and basic arithmetic expression rules and actions:

expr:                           assignexpr
                            |   expr PLUS expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in addition!" : "Second operand isn\'t a number in addition!";
                                        yyerror(token_node, "Both addition operands must be numbers!");
                                    } else
                                    {
                                        double result = $1->numConst + $3->numConst;
                                        $$ = newexpr(arithmetic_e);
                                        $$->sym = newtemp(scope);
                                        $$->numConst = result;
                                        emit(add, $1, $3, $$, nextquadlabel(), yylineno);
                                    }
                                }
                            |   expr MIN expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both suctraction operands must be numbers!");
                                    } else
                                    {
                                        double result = $1->numConst - $3->numConst;
                                        $$ = newexpr(arithmetic_e);
                                        $$->sym = newtemp(scope);
                                        $$->numConst = result;
                                        emit(sub, $1, $3, $$, nextquadlabel(), yylineno);
                                    }
                                }
                            |   expr MUL expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both multiplication operands must be numbers!");
                                    } else
                                    {
                                        double result = $1->numConst * $3->numConst;
                                        $$ = newexpr(arithmetic_e);
                                        $$->sym = newtemp(scope);
                                        $$->numConst = result;
                                        emit(mul, $1, $3, $$, nextquadlabel(), yylineno);
                                    }
                                }
                            |   expr DIV expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both division operands must be numbers!");
                                    } else
                                    {
                                        if($3->numConst == 0) {
                                            yyerror(token_node, "division by 0!");
                                        } else {
                                            double result = $1->numConst / $3->numConst;
                                            $$ = newexpr(arithmetic_e);
                                            $$->sym = newtemp(scope);
                                            $$->numConst = result;
                                            emit(div_op, $1, $3, $$, nextquadlabel(), yylineno);
                                        }
                                    }
                                }
                            |   expr MOD expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both modulus operands must be numbers!");
                                    } else
                                    {
                                        if($3->numConst == 0) {
                                            yyerror(token_node, "division by 0!");
                                        } else {
                                            double result = fmod($1->numConst,$3->numConst);
                                            $$ = newexpr(arithmetic_e);
                                            $$->sym = newtemp(scope);
                                            $$->numConst = result;
                                            emit(mod_op, $1, $3, $$, nextquadlabel(), yylineno);
                                        }
                                    }
                                }
...


assignexpr:                     lvalue ASSIGN expr  {   if ( isMemberOfFunc )
                                                        {
                                                            isMemberOfFunc=false;
                                                        }
                                                        else{   if ( islocalid==true ){
                                                                    islocalid = false;
                                                                }else{
                                                                    if ( isLibFunc($1->sym->key) ) yyerror(token_node,"Library function \"" + $1->sym->key + "\" is not lvalue!");
                                                                    if (SymTable_lookup(symtab,$1->sym->key,scope,false) && isFunc($1->sym->key,scope)) yyerror(token_node,"User function \"" + $1->sym->key + "\" is not lvalue!");
                                                                }
                                                        }
                                                        if($1->type == tableitem_e)
                                                        {
                                                            // lvalue[index] = expr
                                                            emit(tablesetelem,$1->index,$3,$1,nextquadlabel(),yylineno);
                                                            $$ = emit_iftableitem($1,nextquadlabel(),yylineno, scope);
                                                            $$->type = assignment;
                                                        } else
                                                        {
                                                            emit(assign,$3,NULL,$1,nextquadlabel(),yylineno); //lval = expr;
                                                            $$ = newexpr(assignment);
                                                            $$->sym = newtemp(scope);
                                                            emit(assign, $1,NULL,$$,nextquadlabel(),yylineno);
                                                        }
                                                    }
                            ;

The printQuads method is the following:

void printQuads() {
unsigned int index = 1;
cout << "quad#\t\topcode\t\tresult\t\targ1\t\targ2\t\tlabel" <<endl;
cout << "-------------------------------------------------------------------------------------------------" << endl;
for(quad q : quads) {
    string arg1_type = "";
    string arg2_type = "";
    cout << "quad before arg1 type check" << endl;
    if(q.arg1 != nullptr) {
        switch (q.arg1->type) {
            case const_bool:
                arg1_type = "\'" + BoolToString(q.arg1->boolConst) + "\'";
                break;
            case const_string:
                arg1_type = "\"" + q.arg1->strConst + "\"";
                break;
            case const_num:
                arg1_type = to_string(q.arg1->numConst);
                break;
            case var:
                arg1_type = q.arg1->sym->key;
                break;
            case nil_e:
                arg1_type = "nil";
                break;
            default:
                arg1_type = q.arg1->sym->key;
                break;
        }
    }
    cout << "quad before arg2 type check" << endl;
    if(q.arg2 !=  nullptr) {
        switch (q.arg2->type) {
            case const_bool:
                arg2_type = "\'" + BoolToString(q.arg2->boolConst) + "\'";
                break;
            case const_string:
                arg2_type = "\"" + q.arg2->strConst + "\"";
                break;
            case const_num:
                arg2_type = to_string(q.arg2->numConst);
                break;
            case nil_e:
                arg2_type = "nil";
                break;
            default:
                arg2_type = q.arg2->sym->key;
                break;
        }
    }
    string label = "";
    if(q.op == if_eq || q.op == if_noteq || q.op == if_lesseq || q.op == if_greatereq
        || q.op == if_less || q.op == if_greater || q.op == jump) label = q.label;

    string resultKey = "";
    cout << "quad before key check" << endl;
    if(q.result != nullptr && q.result->sym != nullptr) {
        cout << "quad " << opcodeStrings[q.op] << " inside if key check for" << opcodeStrings[q.op] << endl;
        resultKey = q.result->sym->key;
    }
    cout << "quad after key check" << endl;
    cout << index << ":\t\t" << opcodeStrings[q.op] << "\t\t" << resultKey << "\t\t" << arg1_type << "\t\t" << arg2_type << "\t\t" << label << "\t\t" << endl;
    index++;
}
}

The quads variable is just a vector of quads. Here is the quad struct:

enum expr_t {
var,
tableitem_e,
user_func,
lib_func,
arithmetic_e,
assignment,
newtable_e,
const_num,
const_bool,
const_string,
nil_e,
bool_e
};

struct expr {
    expr_t type;
    binding* sym;
    expr* index;
    double numConst;
    string strConst;
    bool boolConst;
    expr* next;
};

struct quad {
    iopcode op;
    expr* result;
    expr* arg1;
    expr* arg2;
    unsigned int label;
    unsigned int line;
};

The binding* is defined as follows and is a symbol table binding:

enum SymbolType{GLOBAL_, LOCAL_, FORMAL_, USERFUNC_, LIBFUNC_, TEMP};

struct binding{
    std::string key;
    bool isactive = true;
    SymbolType sym;
    //vector<binding *> formals;
    scope_space space;
    unsigned int offset;
    unsigned int  scope;
    int line;
};

Here are the emit(), newtemp & newexpr() methods:

void emit(
        iopcode         op,
        expr*           arg1,
        expr*           arg2,
        expr*           result,
        unsigned int    label,
        unsigned int    line
    ){
    quad p;
    p.op            = op;
    p.arg1          = arg1;
    p.arg2          = arg2;
    p.result        = result;
    p.label         = label;
    p.line          = line;
    currQuad++;
    quads.push_back(p);
}

binding *newtemp(unsigned int scope){
    string name = newTempName();
    binding* sym = SymTable_get(symtab,name,scope);
    if (sym== nullptr){
        SymTable_put(symtab,name,scope,TEMP,-1);
        binding* sym =  SymTable_get(symtab,name,scope);
        return sym;
    }else return sym;
}

string newTempName(){
    string temp = "_t" + to_string(countertemp) + " ";
    countertemp++;
    return temp;
}

expr* newexpr(expr_t exprt){
    expr* current = new expr;
    current->sym = NULL;
    current->index = NULL;
    current->numConst = 0;
    current->strConst = "";
    current->boolConst = false;
    current->next = NULL;
    current->type = exprt;
    return current;
}

unsigned int countertemp = 0;
unsigned int currQuad = 0;

Symbol table cpp file:

#include <algorithm>
bool isHidingBindings = false;

/* Return a hash code for pcKey.*/
static unsigned int SymTable_hash(string pcKey){
  size_t ui;
  unsigned int uiHash = 0U;
  for (ui = 0U; pcKey[ui] != '\0'; ui++)
    uiHash = uiHash * HASH_MULTIPLIER + pcKey[ui];
  return (uiHash % DEFAULT_SIZE);
}

/*If b contains a binding with key pcKey, returns 1.Otherwise 0.
It is a checked runtime error for oSymTable and pcKey to be NULL.*/
int Bucket_contains(scope_bucket b, string pcKey){
    vector<binding> current = b.entries[SymTable_hash(pcKey)]; /*find the entry binding based on the argument pcKey*/
    for (int i=0; i<current.size(); i++){
        binding cur = current.at(i);
        if (cur.key==pcKey) return 1;
    }   
    return 0;
}

/*epistrefei to index gia to bucket pou antistixei sto scope 'scope'.Se periptwsh pou den uparxei
akoma bucket gia to en logw scope, ean to create einai true dhmiourgei to antistoixo bucket sto
oSymTable kai epistrefei to index tou.Diaforetika epistrefei thn timh -1.*/
int indexofscope(SymTable_T &oSymTable, unsigned int scope, bool create){
    int index=-1;
    for(int i=0; i<oSymTable.buckets.size(); i++) if (oSymTable.buckets[i].scope == scope) index=i;
    if ( index==-1 && create ){
        scope_bucket newbucket;
        newbucket.scope = scope;
        oSymTable.buckets.push_back(newbucket);
        index = oSymTable.buckets.size()-1;
    }
    return index;
}

/*If there is no binding with key : pcKey in oSymTable, puts a new binding with
this key and value : pvvValue returning 1.Otherise, it just returns 0.
It is a checked runtime error for oSymTable and pcKey to be NULL.*/
int SymTable_put(SymTable_T &oSymTable, string pcKey,unsigned int scope, SymbolType st, unsigned int line){
    int index = indexofscope(oSymTable,scope, true);
    if(index==-1) cerr<<"ERROR"<<endl;
    scope_bucket *current = &oSymTable.buckets.at(index);
    if ( Bucket_contains(*current, pcKey) && st != FORMAL_ && st != LOCAL_) return 0; /*If the binding exists in oSymTable return 0.*/
    binding newnode;
    newnode.key = pcKey;
    newnode.isactive = true;
    newnode.line =  line;
    newnode.sym = st;
    newnode.scope = scope;
    current->entries[SymTable_hash(pcKey)].push_back(newnode);
    return 1;
}

/*Pairnei ws orisma to oSymTable kai to scope pou theloume na apenergopoihsoume.
An to sugkekrimeno scope den uparxei sto oSymTable epistrefei -1.Diaforetika 0*/
void SymTable_hide(SymTable_T &oSymTable, unsigned int scope){
    isHidingBindings = true;
    for(int i=scope; i >= 0; i--) {
        if(i == 0) return;
        int index = indexofscope(oSymTable,i,false);
        if(index == -1) continue;
        scope_bucket *current = &oSymTable.buckets.at(index);
        for (int i=0; i<DEFAULT_SIZE; i++) {
            for (int j=0; j<current->entries[i].size(); j++) {
                if(current->entries[i].at(j).sym == LOCAL_ || current->entries[i].at(j).sym == FORMAL_) 
                    current->entries[i].at(j).isactive = false;
            }
        }
    }
}

void SymTable_show(SymTable_T &oSymTable, unsigned int scope){
    isHidingBindings = false;
    for(int i=scope; i >= 0; i--) {
        if(i == 0) return;
        int index = indexofscope(oSymTable,i,false);
         if(index == -1) continue;
        scope_bucket *current = &oSymTable.buckets.at(index);
        for (int i=0; i<DEFAULT_SIZE; i++) {
            for (int j=0; j<current->entries[i].size(); j++) {
                if(current->entries[i].at(j).sym == LOCAL_ || current->entries[i].at(j).sym == FORMAL_) 
                    current->entries[i].at(j).isactive = true;
            }
        }
    }
}

bool SymTable_lookup(SymTable_T oSymTable, string pcKey, unsigned int scope, bool searchInScopeOnly){
    for(int i=scope; i >= 0; i--) {
        if(searchInScopeOnly && i != scope) break;
        int index = indexofscope(oSymTable,i,false);
         if(index == -1) continue;
        scope_bucket current = oSymTable.buckets[index];
        for(vector<binding> entry : current.entries) {
            for(binding b : entry) {
                if(b.key == pcKey && b.isactive) return true;
                else if(b.key == pcKey && !b.isactive) return false;
            }
        }
    }
    return false;
}

binding* SymTable_lookupAndGet(SymTable_T &oSymTable, string pcKey, unsigned int scope) noexcept{
    for ( int i=scope; i >= 0; --i ){
        int index = indexofscope(oSymTable,i,false );
        if (index==-1) continue;
        scope_bucket &current = oSymTable.buckets[index];
        for (auto &entry : current.entries) {
            for (auto &b : entry ){
                if ( b.key == pcKey ) return &b;
            }
        }
    }
    return nullptr;
}

/*Lamvanei ws orisma to oSymTable, kleidh tou tou desmou pou psaxnoume kai to scope tou desmou.
H sunarthsh telika epistrefei to value tou tou desmou.Diaforetika epistrefei 0*/
binding* SymTable_get(SymTable_T &oSymTable, const string pcKey, unsigned int scope){
    for ( int i=scope; i >= 0; --i )
    {
        const int index = indexofscope( oSymTable, i, false );
        if ( index == -1 )
        {
            continue;
        }

        scope_bucket& current = oSymTable.buckets[index];

        for ( auto& entry : current.entries)
        {
            for ( auto& b : entry )
            {
                if ( b.key == pcKey )
                {
                    return &b;
                }
            }
        }
    }
    return nullptr;
}

When run with the following test file, the issue occurs at the z5 = 4 / 2; expression's assign quad:

// simple arithmetic operations
z1 = 1 + 2;
z10 = 1 + 1;
z2 = 1 - 3;
z3 = 4 * 4;
z4 = 5 / 2;

What's confusing is that if I print out the sym->key after each emit() in the arithmetic-related actions, I can see the keys just fine. But once I try to access them inside the printQuads it will fail (for the div operation at least so far). This has me thinking that maybe we are shallow copying the binding* sym thus losing the key? But how come the rest of them are printed normally?

I'm thinking that the issue (which has occured again in the past at various stages) could be caused by us using a ton of copy-by-value instead of by-reference but I can't exactly confirm this because most of the time it works (I'm guessing that means that this is undefined behavior?).

I'm sure this is very difficult to help debug but maybe someone will eyeball something that I can't see after this many hours.


Solution

  • Debugging by eyeballing your code is probably a useful skill, but it's far from the most productive form of debugging. These days, it's much less necessary, since there are lots of good tools which you can use to detect problems. (Here, I do mean "you", specifically. I can't use any of those tools because I don't have your complete project in front of me. And nor do I particularly want it; this is not a request for you to paste hundreds of lines of code).

    You're almost certainly right that your problem is related to some kind of undefined behaviour. If you're correct about the bad_alloc exception being thrown by what is effectively a copy of a std::string, then it's most likely the result of the thing being copied from not being a valid std::string. Perhaps it's an actual std::string object whose internal members have been corrupted; perhaps the pointer is not actually pointing to an active std::string (which I think is the real problem, see below). Or perhaps it's something else.

    Either way, the error occurred long before the bug manifests itself, so you're only going to stumble upon where it happened by blind luck. On the other hand, there are a variety of memory error detection tools available which may be able to pinpoint the precise moment in which you violated the contract by reading or writing to memory which didn't belong to you. These include Valgrind and AddressSanitizer (also known as ASan); one or both of these is certainly available for the platform on which you are developing your project. (I say that confidently even without knowing what that platform is, but you'll have to do a little research to find the one which works best for your particular environment. Both of those names can be looked up on Wikipedia.) These tools are very easy to use, and extraordinarily useful; they can save you hours or days of debugging and a lot of frustration. As an extra added bonus, they can detect bugs you don't even know you have, saving you the embarrassment of shipping a program which will blow up in the hands of the customer or the person who is marking your assignment. So I strongly recommend learning how to use them.

    I probably should leave it at that, because it's better motivation to learn to use the tools. Still, I can't resist making a guess about where the problem lies. But honestly, you will learn a lot more by ignoring what I'm about to say and trying to figure out the problem yourself.

    Anyway, you don't include much in the way of information about your SymTable_T class, and the inconsistent naming convention makes me wonder if you even wrote its code; perhaps it was part of the skeleton code you were given for this assignment. From what I can see in SymTable_put and SymTable_get, the SymTable_T includes something like a hash table, but doesn't use the C++ standard library associative containers. (That's a mistake from the beginning, IMHO. This assignment is about learning how to generate code, not how to write a good hash table. The C++ standard library associative containers are certainly adequate for your purposes, whether or not they are the absolute ideal for your use case, and they have the enormous advantages of already being thoroughly documented and debugged.)

    It's possible that SymTable_T was not originally written in C++ at all. The use of free-standing functions like SymTable_put and SymTable_get rather than class methods is difficult to explain unless the functions were originally written in C, which doesn't allow object methods. On the other hand, they appear to use C++ standard library collections, as evidenced by the call to push_back in SymTable_put:

    current->entries[SymTable_hash(pcKey)].push_back(newnode);
    

    That suggests that entries is a std::vector (although there are other possibilities), and if it is, it should raise a red flag when you combine it with this, from SymTable_get (whitespace-edited to save screen space here):

    for ( auto& entry : current.entries) {
        for ( auto& b : entry ) {
            if ( b.key == pcKey )
                return &b;
        }
    }
    

    To be honest, I don't understand that double loop. To start with, you seem to be ignoring the fact that there is a hash table somewhere in that data structure, but beyond that, it seems to me that entry should be a binding (that's what SymTable_put pushes onto the entries container), and I don't see where a binding is an iterable object. Perhaps I'm not reading that correctly.)

    Regardless, evidently SymTable_get is returning a reference to something which is stored in a container, probably a std::vector, and that container is modified from time to time by having new elements pushed onto it. And pushing a new element onto the end of a std::vector invalidates all existing references to every element of the vector. (See https://en.cppreference.com/w/cpp/container/vector/push_back)

    Thus, newtemp, which returns a binding* acquired from SymTable_get, is returning a pointer which may be invalidated in the future by some call to SymTable_put (though not by every call to that function; only the ones where the stars unline unhappily). That pointer is then stored into a data object which will (much later) be given to printQuads, which will attempt to use the pointer to make a copy of a string which it will attempt to print. And, as I mentioned towards the beginning of this treatise, trying to use an object which is pointed to by a dangling pointer is Undefined Behaviour.

    As a minor note, making a copy of a string in order to print it out is completely unnecessary. A reference would work just fine, and save a bunch of unnecessary memory allocations. But that won't fix the problem (if my guess turns out to be correct) because printing through a dangling pointer is just as Undefined Behaviour as making a copy through a dangling pointer, and will likely manifest in some other mysterious way.