Search code examples
c++dictionaryhashmapexc-bad-access

C++ Bad access when assigning an element to map value


So the question explains the problem...

Background:

I'm trying to solve this problem from HackerRank.

It's basically an html tag parser. Valid input guaranteed, attributes are strings only.

My Approach

I created a custom Tag class that can store a map<string,Tag> of other Tag's, as well as a map<string,string> of attributes. The parsing seems to be working correctly.

The Problem

During the querying part, I get a BAD_ACCESS error on the following query/html combo:

4 1
<a value = "GoodVal">
<b value = "BadVal" size = "10">
</b>
</a>
a.b~size

The error occurs when I try to access the b Tag from a. Specifically, it's in the t=t.tags[tag_name], Line 118 below.

Code

#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <map>
#include <stack>
using namespace std;

class Tag {
public:
    Tag(){};
    Tag(string name):name(name){};
    string name;
    map<string,Tag> tags = map<string, Tag>();
    map<string,string> attribs=map<string,string>();
};

int main() {
    int lines, queries;
    std::cin>>lines>>queries;

    std:string str;
    getline(cin, str);
    stack<string> open;
    auto tags = map<string, Tag>();
    for (int i = 0; i < lines; i++) {
        getline(cin, str);
        if (str.length()>1){
            // If it's not </tag>, then it's an opening tag
            if  (str[1] != '/') {
                // Parse tag name
                auto wordidx = str.find(" ");
                if (wordidx == -1) {
                    wordidx = str.length()-1.f;
                }
                string name = str.substr(1,wordidx-1);
                auto t = Tag(name);

                string sub = str.substr(wordidx);
                auto equalidx=sub.find("=");

                // Parse Attributes
                while (equalidx != std::string::npos) {
                    string key = sub.substr(1,equalidx-2);
                    sub = sub.substr(equalidx);
                    auto attrib_start = sub.find("\"");
                    sub = sub.substr(attrib_start+1);
                    auto attrib_end = sub.find("\"");
                    string val = sub.substr(0, attrib_end);
                    sub = sub.substr(attrib_end+1);

                    t.attribs[key] = val;
                    equalidx=sub.find("=");
                }

                // If we're in a tag, push to that, else push to the base tags
                if (open.size() == 0) {
                    tags[name] = t;
                } else {
                    tags[open.top()].tags[name]=t;
                }
                open.push(name);
            } else {
                // Pop the stack if we reached a closing tag
                auto wordidx = str.find(">");
                string name = str.substr(2,wordidx-2);

                // Sanity check, but we're assuming valid input
                if (name.compare(open.top())) {
                    cout<<"FUCK"<<name<<open.top()<<endl;
                    return 9;
                }
                open.pop();
            }

        } else {
            std::cout<<"FUCK\n";
        }
    }

    //
    // Parse in queries
    //
    for (int i = 0; i < queries; i++) {
        getline(cin, str);
        Tag t = Tag();
        bool defined = false;


        auto next_dot = str.find(".");
        while (next_dot!=string::npos) {
            string name = str.substr(0,next_dot);
            if (defined && t.tags.find(name) == t.tags.end()) {
                //TAG NOT IN T
                cout<<"Not Found!"<<endl;
                continue;
            }
            t = !defined ? tags[name] : t.tags[name];
            defined = true;

            str = str.substr(next_dot+1);
            next_dot = str.find(".");
        }

        auto splitter = str.find("~");
        string tag_name = str.substr(0,splitter);
        string attrib_name = str.substr(splitter+1);

        if (!defined) {
            t = tags[tag_name];
        } else if (t.tags.find(tag_name) == t.tags.end()) {
            //TAG NOT IN T
            cout<<"Not Found!"<<endl;
            continue;
        } else {
            t = t.tags[tag_name];
        }
        // T is now set, check the attribute
        if (t.attribs.find(attrib_name) == t.attribs.end()) {
            cout<<"Not Found!"<<endl;
        } else {
            cout<<t.attribs[attrib_name]<<endl;
        }

    }

    return 0;
}

What I've tried

This is fixed by just defining Tag x = t.tags[tag_name]; in the line above as a new variable, and then doing t = x; but why is this even happening?

Also, the following query also then fails: a.b.c~height, but it fails on Line 99 when it tried to get a.tags["b"]. No idea why. I was gonna just go with the hacky fix above, but this seems like a big core issue that i'm doing wrong.

I would suggest running this on an IDE and verifying that the parsing is indeed correct.


Solution

  • t=t.tags[tag_name]
    

    This expression is unsafe because you are copy-assigning an object that is owned by that object over the owning object.

    Consider what happens on this line:

    1. The map lookup is performed and returns a Tag&.
    2. You try to copy-assign this to t, invoking the implicit copy-assigment operator.
    3. This operator copy-assigns t.tags from the tags attribute of the copy source -- which lives in t.tags.

    The result is that the object you're copying into t is destroyed in the middle of that copy. This causes undefined behavior, and an immediate crash is honestly the best possible outcome as it told you exactly where the problem was. (This kind of problem frequently manifests at some point later in the program, at which point you've lost the state necessary to figure out what caused the UB.)

    One workaround would be to move the source object into a temporary and then move-assign that temporary over t:

    t = Tag{std::move(t.tags[tag_name])};
    

    This lifts the data we want to assign to t out of t before we try to put it in t. Then, when t's assignment operator goes to replace t.tags, the data you're trying to assign to t doesn't live there anymore.

    However, this overall approach involves a lot of unnecessary copying. It would be better to declare t as Tag const *t; instead -- have it be a pointer to a tag. Then you can just move that pointer around to point at other tags in your data structure without making copies.


    Side note: I just did this problem the other day! Here's a hint that might help you simplify things: do you actually need a structure of tags? Is there a simpler type of lookup structure that would work instead of nested tags?