Search code examples
cpointersbinary-search-treepass-by-referenceheap-memory

Inserting into a binary search tree in C


I'm currently learning C and also some datastructures such as binary search trees etc. I have trouble understanding HOW exactly changing pointer values within a function works in some cases and in others doesn't... I'll attach some of my code I wrote. It's an insert function which inserts values in the correct places in the BST (it works as it should). I tried working with pointers to pointers to be able to change values withing a function. Even though it works, im still really confused why it actually does. I don't quite understand why my insert function actually changes the BST even though I only work with local variables (tmp, parent_ptr) in my insert function and I don't really dereference any pointers apart from " tmp = *p2r " in the insert function.

Thanks for helping out.

#include <stdio.h>
#include <stdlib.h>


struct TreeNode{
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
};

struct TreeNode** createTree(){
    struct TreeNode** p2r;
    p2r = malloc(sizeof(struct TreeNode*));
    *p2r = NULL;
    return p2r;
}

void insert(struct TreeNode** p2r, int val){
    // create TreeNode which we will insert
    struct TreeNode* new_node = malloc(sizeof(struct TreeNode));
    new_node -> val = val;
    new_node -> left = NULL;
    new_node -> right = NULL;
    //define onestep delayed pointer
    struct TreeNode* parent_ptr = NULL;
    struct TreeNode* tmp = NULL;
    tmp = *p2r;
    // find right place to insert node
    while (tmp != NULL){
        parent_ptr = tmp;
        if (tmp -> val < val) tmp = tmp->right;
        else tmp = tmp->left;
    }
    if (parent_ptr == NULL){
        *p2r = new_node;
    }
    else if (parent_ptr->val < val){ //then insert on the right
        parent_ptr -> right = new_node;
    }else{
        parent_ptr -> left = new_node;
    }
}

int main(){
    struct TreeNode **p2r = createTree();
    insert(p2r, 4);
    insert(p2r, 2);
    insert(p2r, 3);
    return 0;
}

Solution

  • Let's analyze the approach step by step.

    At first we consider the following simple program.

    #include <stdio.h>
    #include <stdlib.h>
    
    struct TreeNode{
        int val;
        struct TreeNode *left;
        struct TreeNode *right;
    };
    
    void create( struct TreeNode *head, int val )
    {
        head = malloc( sizeof( struct TreeNode ) );
        
        head->val   = val;
        head->left  = NULL;
        head->right = NULL;
    }
    
    int main(void) 
    {
        struct TreeNode *head = NULL;
        
        printf( "Before calling the function create head == NULL is %s\n",
                head == NULL ? "true" : "false" );
                
        create( head, 10 );
        
        printf( "After  calling the function create head == NULL is %s\n",
                head == NULL ? "true" : "false" );
                
        return 0;
    }
    

    The program output is

    Before calling the function create head == NULL is true
    After  calling the function create head == NULL is true
    

    As you can see the pointer head in main was not changed. The reason is that the function deals with a copy of the value of the original pointer head. So changing the copy does not influence on the original pointer.

    If you rename the function parameter to head_parm (to distinguish the original pointer named head and the function parameter) then you can imagine the function definition and its call the following way

    create( head, 10 );
    
    //...
    
    void create( /*struct TreeNode *head_parm, int val */ )
    {
        struct TreNode *head_parm = head;
        int val = 10;
        head_parm = malloc( sizeof( struct TreeNode ) );
        //...
    

    That is within the function there is created a local variable head_parm that is initialized by the value of the argument head and this function local variable head_parm is changed within the function.

    It means that function arguments are passed by value.

    To change the original pointer head declared in main you need to pass it by reference.

    In C the mechanism of passing by reference is implemented by passing an object indirectly through a pointer to it. Thus dereferencing the pointer in a function you will get a direct access to the original object.

    So let's rewrite the above program the following way.

    #include <stdio.h>
    #include <stdlib.h>
    
    struct TreeNode{
        int val;
        struct TreeNode *left;
        struct TreeNode *right;
    };
    
    void create( struct TreeNode **head, int val )
    {
        *head = malloc( sizeof( struct TreeNode ) );
        
        ( *head )->val   = val;
        ( *head )->left  = NULL;
        ( *head )->right = NULL;
    }
    
    int main(void) 
    {
        struct TreeNode *head = NULL;
        
        printf( "Before calling the function create head == NULL is %s\n",
                head == NULL ? "true" : "false" );
                
        create( &head, 10 );
        
        printf( "After  calling the function create head == NULL is %s\n",
                head == NULL ? "true" : "false" );
                
        return 0;
    }
    

    Now the program output is

    Before calling the function create head == NULL is true
    After  calling the function create head == NULL is false            
    

    In your program in the question you did not declare the pointer to the head node like in the program above

    struct TreeNode *head = NULL;
    

    You allocated this pointer dynamically. In fact what you are doing in your program is the following

    #include <stdio.h>
    #include <stdlib.h>
    
    struct TreeNode{
        int val;
        struct TreeNode *left;
        struct TreeNode *right;
    };
    
    void create( struct TreeNode **head, int val )
    {
        *head = malloc( sizeof( struct TreeNode ) );
        
        ( *head )->val   = val;
        ( *head )->left  = NULL;
        ( *head )->right = NULL;
    }
    
    int main(void) 
    {
        struct TreeNode **p2r = malloc( sizeof( struct TreeNode * ) );
        *p2r = NULL;
        
        printf( "Before calling the function create *p2r == NULL is %s\n",
                *p2r == NULL ? "true" : "false" );
                
        create( p2r, 10 );
        
        printf( "After  calling the function create *p2r == NULL is %s\n",
                *p2r == NULL ? "true" : "false" );
                
        return 0;
    }
    

    The program output is

    Before calling the function create *p2r == NULL is true
    After  calling the function create *p2r == NULL is false
    

    That is compared with the previous program when you used the expression &head of the type struct TreeNode ** to call the function create you are now introduced an intermediate variable p2r which stores the value of the expression &head due to this code snippet

    struct TreeNode **p2r = malloc( sizeof( struct TreeNode * ) );
    *p2r = NULL;
    

    That is early you called the function create like

    create( &head, 10 );
    

    Now in fact you are calling the function like

    struct TreeNode **p2r = &head; // where head was allocated dynamically
    create( p2r, 10 );
    

    The same takes place in your program. That is within the function insert dereferencing the pointer p2r you have a direct access to the pointer to the head node

    if (parent_ptr == NULL){
        *p2r = new_node;
        ^^^^ 
    }
    

    As a result the function changes the pointer to the head node passed by reference through the pointer p2r.

    The data members left and right of other nodes are also changed through references to them using the pointer parent_ptr

    else if (parent_ptr->val < val){ //then insert on the right
        parent_ptr -> right = new_node;
        ^^^^^^^^^^^^^^^^^^^  
    }else{
        parent_ptr -> left = new_node;
        ^^^^^^^^^^^^^^^^^^
    }