Search code examples
binary-search-treecomputer-science

The expected height of a randomly filled binary search tree on n distinct keys is O(log(n))


I am working on a project for school, and the assignment has us:

  1. Implement the Tree-Insert(T,z) operation on a binary search tree.
  2. Repeatedly Insert n randomly picked numbers in a binary search tree and collect the height h.
  3. Plot on the same plot the quantity h/lg(n) versus n.

I have a completed program that outputs the value of n from 250-50K, the average height of my randomly filled BST, and the value of height(n)/log2(n). I can provide the code if needed but the pseudocode for the program looks like:

collectData()
for n = 250 to 10,000 by 250 // 250, 500, 750, …. 10,000
    sum_heightn = 0
    for j = 1 to 10 do //Take 10 measurements mj for j=1 to 10
        for i = 1 to n
           pick randomly a number p in the range [0,n]
           create a node z
           set z.key = p
           Tree-Insert(T,z)
        Measure the height hj of the tree
        Discard Tree
        sum_height += hj
collect Height(n)= sum_heightn/10 // Average height for n
Write in a file F the value n and Height(n)/log_2(n).  

I am struggling trying to understand what the value I get when calculating h(n)/log2(n) means. Can anyone help explain? Thanks.


Solution

  • A binary tree of height h can have at most 2^h - 1 nodes.

         1
        / \
       /   \
      2     3
    

    The height of the above tree is 2, and the maximum number of nodes that you can have in this tree is 2^2 - 1 = 3.

    log_2(n) for a large n denotes (approximately) the height of a completely filled tree. Similarly, you can say that asymptotically, the height of a balanced tree with n nodes will be O(log_2(n)).

    height / log_2(n) is simply a ratio which can go

    • maximum up to n / log_2(n) when the tree is skewed (when the order of insertion order is either increasing or decreasing).
    • minimum up to 1 when the tree is perfectly balanced and the height of the tree becomes log_2(n).

    You can see this ratio in the following way: It depicts the degree of skewness of the given tree. If this ration is close to n/log_2(n), then the tree is very skewed and if the ratio is close enough to 1, then the tree is much balanced.