Search code examples
c++mathlinked-listbinary-search-treestandard-deviation

Standard Deviation of Linked Binary Search Tree


I am currently working on a project for my Data Structures Course that involves a Binary Search Tree built using a doubly linked list-type format (where each node has a left, right, and parent pointer). The class managing the tree also has a root pointer and a current pointer (like a cursor).

One of the functions I need to build calculates the standard deviation of the entire tree. Unfortunately, I'm getting stuck on how to approach this function without using a standard library container (It's heavily discouraged within the course). I understand how to get the sum of the entire tree, but being able to access each number individually to compute the standard deviation is what is confusing me (since I can not store the tree as an array and access the contents sequentially.

Please forgive me if this is a basic question or I am missing some obvious approach. I tend to struggle with data structure concepts and especially when it comes to recursion. I'm not requesting someone to write the function, but at least to point me in the right direction. Thank you!

Edit: Thank you trincot for your help, I was thinking of the method the wrong way but your explanation helped!


Solution

  • I understand how to get the sum of the entire tree

    So you know how to inspect the value of each node and do something like:

    sum += current_node->value;
    

    Then keep track of a count too (unless you already keep track of the count as a list property):

    sum += current_node->value;
    node_count++;
    

    After you have visited all the nodes, you can calculate the mean:

    mean = sum / node_count;
    

    Then traverse all the nodes of the tree a second time, and calculate the sum of the squared deviations, starting from zero:

    sum_sq_dev += (current_node->value - mean) * (current_node->value - mean);
    

    After you have visited all the nodes for accumulating that sum of squared deviations, calculate the variance from it, and the standard deviation:

    variance = sum_sq_dev / node_count;
    std_dev = sqrt(variance);
    

    See Wikipedia on sample variance