Search code examples
cparsingbisonyaccbnf

Bison recursion behaving strangely with linked list


I have a linked list-like struct:

typedef struct NODE
{
  NODE_TYPES type;
  struct NODE* curr;
  struct NODE* next;
} NODE;

and I have this recursive rule:

lines: stmt             {
                            root -> next = $1;
                            $1 -> next = NULL;
                            $$ = $1;

                        }
| lines stmt            {
                            $1 -> next = $2;
                            $2 -> next = NULL;
                            $$ = $2;

                        }
;

stmt is always a pointer to a NODE struct.

root is defined as NODE *root and then later, in the main function, allocated as such: root = malloc(sizeof(NODE));

However, printing my node list recursively starting from root, I only get the last node back. Why am I losing the data in between root and the last node?


Solution

  • Apparently the problem was that you were setting the semantic value of stmt to point to a stack-allocated local variable. Since the local variable's lifetime expires as soon as the semantic action is done, that value is a dangling pointer and using it is Undefined Behaviour. (But in this case you will probably find that all the pointers refer to the same block of memory, whose contents will change every time that particular stack frame is reused.)

    You need to ensure that the value of stmt is a pointer to a newly-allocated NODE:

    stmt : /* ... something ... */ 
                      { $$ = malloc(sizeof *$$);
                        $$ = (NODE){/* ... */, .next = NULL; }
    

    This is important. There's no good way of doing this without some kind of function which heap-allocates a new NODE. You can try to reduce malloc overhead by keeping a pool of NODEs and recycling them yourself, but since most modern malloc implementations do a good job of recycling small fixed-size allocations, it's premature optimization.

    The linked list code will work for simple statements, but relying on a global variable to maintain the head of the list is not very scalable. It will make it impossible to parse compound statements (conditionals and loops, for example) because you'll end up reusing root for the internal statement lists. It's better to just build the list in the natural way (which will be backwards) and reverse it at the end:

    stmts: stmt       /* Default action is fine */
         | stmts stmt { $$ = $2; $$->next = $1; }
    line : stmts      { $$ = reverse($$); }
    

    In this simple code, each time stmts: stmts stmt is reduced, it pushes the new NODE onto the front of the list, which obviously creates the list in reverse order. So when line is reduced (which will be after all the stmts have been parsed), the list needs to be reversed. You can reverse in place:

    NODE* reverse(NODE* last) {
      NODE* next = NULL;
      while (last) {
        NODE* tmp = last->next; 
        last->next = next;
        next = last;
        last = tmp;
      }
      return next;
    }
    

    That completely eliminates the need for the global root, so it is then possible for a compound statement to be correctly assembled.