Search code examples
treeprobabilityasymptotic-complexityinsertionclrs

Probability and Asymptotics


Consider the case when we have n identical inputs, for a binary search tree. We randomly select from x.left and x.right, while inserting the node.

There's a question in clrs (12-1-(d)), which asks us to derive the expected running time of this set up. Intuitively, the answer is simply O(n lg n). But how do I prove it?

Any piece of advice appreciated.

Moon.


Solution

  • Imagine that you have the final tree after this process is done. Go and label each node in the tree using the numbers 1, 2, 3, ..., n so that the resulting tree is a binary search tree. Now, imagine replaying the process that constructed the tree, knowing where each element ended up. The process you'll end up tracing out ends up being equivalent to doing a regular BST insertion on each of the elements based on its numeric values. So this problem ends up being equivalent to the following problem: if you insert a random permutation of the elements 1, 2, 3, ..., n into a binary search tree, what is the expected amount of time you'll spend doing so?

    If you already know the answer to this problem, you're done. If not, one cute trick is to realize that this process ends up being equivalent to running randomized quicksort on an array of the elements 1, 2, 3, ..., n. Here's one way to see this. The root element in the tree corresponds to the first pivot element that was chosen - every element that gets inserted gets compared against it and put into either the left or right subtree. The left child of the root corresponds to the pivot chosen in the recursive call on elements less than the initial pivot - every element less than the initial pivot gets compared against the pivot for the left-hand side. You can formalize this argument using induction: prove by induction that for any array of length n, the number of comparisons made when inserting a random permutation of the elements 1, 2, 3, ..., n into a binary search tree correspond to performing quicksort on that array using pivots whose structure follows the pattern given in the tree.

    Once you have that connection, you're done, since the expected cost of randomized quicksort is Θ(n log n).