Search code examples
crecursioncs50free

How does my recursive function in C work?


I wrote a recursive function in C to free memory allocations made by a program that generates a tree of nodes. However, I can't work out how it works - but it does pass the check50. Perhaps I've not written it well enough for me to understand it.

I've tried following the debugger but I just cannot get my head around how it manages to traverse 3 generations of parents successfully.

I suppose my main questions are:

  1. How does this function free from bottom to top of the ancestry tree? E.g. how and when does free(p) get called? Is it when return is called? I know it's important not to orphan any nodes, but I don't know how this is avoided here.

  2. Following the debugger and stepping through the code, free(p) clearly isn't called until it seems that a grandparent node is reached - e.g. free_family(p->parents[0]) has been called twice and then the subsequent call to p->parents[0] is NULL, before free(p) frees that stack from the call. I can't get my head around how this works at all.

Apologies for the noob question - I'm new to coding and following CS50x currently.

I won't post the entire code unless required, but each node is defined by the following code. There are 3 generations of these nodes. The last generation of parents receives NULL parent pointers.

typedef struct person
{
    struct person *parents[2];
    char alleles[2];
} person;

My recursive free function is:

// Free `p` and all ancestors of `p`.
void free_family(person *p)
{
    // TODO: Handle base case

    if (p == NULL)
    {
        return;
    }

    // TODO: Free parents recursively

    free_family(p->parents[0]);
    free_family(p->parents[1]);

    // TODO: Free child

    free(p);
}

Solution

  • person1
      |- person2
      |    |- person4
      |    |- person5
      |- person3
      |    |- person6
      |    |- person7
    

    Let's assume this is your tree. It's inverted - person1 is the youngest, with parents person2 and person3.

    When you go to free person1, your code does the following:

     - Want to free person1, but the parents are to be freed first
     - Try to free person2
        - Try to free person4
            - person4 has no first parent, so no more recursion. Free person4
        - Try to free person5
            - person5 has no second parent, so no more recursion. Free person5
     - Now that yo're done freeing the parents of person2, you actually free person2
     - Repeat above for person3
     - Now that you are done freeing the parents for person1, actually free person1
    

    This is actually fairly simple. What you need to do to understand it is to follow the control flow. Spend some time with the code itself, and try to follow the control flow line-by-line. This isn't async or threaded code, and has a very linear execution sequence. Try not to get lost in the debugger. It's only after you have a good handle on how control flow normally works that you can make effective use of a debugger.

    Edit : Additional Notes

    There are separate questions here:

    1. How does the program avoid orphaning any nodes?
    2. Why is free() called when it is called?
    3. When does free() actually get called?

    As to 1. :

    As discussed in another answer, the program avoids orphaning any nodes by freeing the child only after freeing the parents. There is no C magic here, and nothing happening behind the scenes. As the programmer, you have put those commands to execute in that order to create that effect. You would have done so knowing that once you free a child, you lose the references to the child's parents (and by extension, all ancestors), and they can never be freed. Therefore, you make sure to free all of those before you get rid of the references to them.

    This brings us to 2. :

    free is called, in your code, after the parents are freed.

    However, you can just as easily call free() first with no other changes. Then, when you go to free the parents, you'll find nulls. At best, you'll create orphans. At worse, you'll segfault. C itself won't stop you from doing this. There may be IDEs and static code analysis tools that will warn you. The compiler might even warn you. But as C code goes, it's technically valid.

    You can avoid this problem while retaining the free-the-child-first sequence by first copying the references to the parents to other variables, then free the child, and then go free the parents. There are (rare) use cases where this kind of approach is needed and used.

    This is all entirely upto you to decide how to do it, and to get it right or shoot yourself in the foot.

    This brings us, finally, to 3:

    free is called when you, the programmer, calls it. You decide when to do this by the order in which the program is written and the various control structures like ifs and fors and function calls you have in place.

    Recursive functions are just functions. In other more complex programming languages, recursive functions are just regular functions wearing funny hats. In C, they aren't even doing that. You don't need to tell the compiler that it's recursive. Noone is acting behind the scenes to make sure you don't shoot yourself in the foot. When you call a function, as long as C can find a reference to call with that name (symbol), it will.

    There's nothing special about free() either. It's just a function call as well. It could just as easily be do_something_else() instead of free(). Other than the fact that the memory isn't going to get freed (because there's no call to free() in this hypothetical code), it'll behave in exactly the same way.