Search code examples
cstringtreesegmentation-faultunions

segfault when printing the third string in a parse tree


I am creating a compiler, and beginning to create my parse tree structure.

I have this for a 'node' which can contain subnodes, or not.

typedef struct Node {
    int node_type;
    union {
        char* string;
        int number;
        struct Node* nodes;
    } node_data;
} Node;

And these functions assemble/print it

Node* MakeNodeFromString(char* mystring) {
    Node* mynode = (Node*) malloc(sizeof(Node));
    mynode->node_data.string = strdup(mystring);
    mynode->node_type = 0; // @TODO not 3
    return mynode;
}

Node* MakeTwoBranchNode(int nodetype, Node* a, Node* b) {
    Node* mynode = (Node*) malloc(sizeof(Node));
    mynode->node_type = 2; // @TODO not 3
    mynode->node_data.nodes = malloc(2 * sizeof(Node*));
    mynode->node_data.nodes[0] = *a; mynode->node_data.nodes[1] = *b;
    return mynode;
}

void printtree (Node *n, int level) {
    if (!n) return;

    printf ("start %d\n", n->node_type);
    switch (n->node_type) {
        case 2:
            printf ("%*c2\n", level, ' ');
            printtree (&n->node_data.nodes[0], level+1);
            printtree (&n->node_data.nodes[1], level+1);
            break;
        case 0:
            printf ("%*c%s\n", level, ' ', n->node_data.string);
            break;
    }
    printf ("end %d\n", n->node_type);
}

And whenever I assemble a tree I get segfaults either printf'ing or strlen'ing my strings. I've tried strdup, strcpy, etc. I'm pretty sure its not MakeTwoBranchNode that's failing because I can create large trees of numbers (code not included). But I'm not sure.

This is a codesample of where it does - and doesnt - segfault on my machine

int main() {
    // Works
    printtree(
        MakeTwoBranchNode(3,
            MakeNodeFromString("first string"),
            MakeNodeFromString("second string")
        ),
        1
    );
    // Fails
    printtree(
        MakeTwoBranchNode(3,
            MakeTwoBranchNode(3,
                MakeNodeFromString("first string"),
                MakeNodeFromString("second string")
            ),
            MakeNodeFromString("third string")
        ),
        1
    );
}

If you run this example (and can understand its cryptic output) you'll see it segfaults during printf(n->node_data.string).


Solution

  • You're allocating sizeof-a-pointer, not sizeof-a-node in the following:

    Node* MakeTwoBranchNode(int nodetype, Node* a, Node* b) {
        Node* mynode = malloc(sizeof(Node));
        mynode->node_type = 2; // @TODO not 3
        mynode->node_data.nodes = malloc(2 * sizeof(Node*)); // HERE
        mynode->node_data.nodes[0] = *a; mynode->node_data.nodes[1] = *b;
        return mynode;
    }
    

    You could change the commented line above to:

        mynode->node_data.nodes = malloc(2 * sizeof(Node));
    

    However, as written your test program as a memory leak. What happens to the memory you allocated for the nodes your passing to MakeTwoBranchNode()? This is not what I think you really want. You wold be better off using a pointer-array in the first place.

    typedef struct Node {
        int node_type;
        union {
            char* string;
            int number;
            struct Node *nodes[2];
        } node_data;
    } Node;
    

    Then save the actual pointers passed to MakeTwoBranchNode. In doing so you're passing ownership of those nodes to the two-branch node (and thus you should also ensure when it is freed it properly cleans up its child nodes):

    Node* MakeTwoBranchNode(int nodetype, Node* a, Node* b) {
        Node* mynode = malloc(sizeof(Node));
        mynode->node_type = 2; // @TODO not 3
        mynode->node_data.nodes[0] = a; 
        mynode->node_data.nodes[1] = b;
        return mynode;
    }
    

    Now there's no memory leak unless you fail to free() the pointers in nodes[0] and nodes[1] when you're destroying the two-branch node.