Search code examples
algorithmstreambinary-search-tree

Cracking the Coding Interview 6th Edition: 10.10. Rank from Stream


The problem statement is as follows:

Imagine you are reading in a stream of integers. Periodically, you wish to be able to look up the rank of a number x (the number of values less than or equal to x). Implement the data structures and algorithms to support these operations.That is, implement the method track (in t x), which is called when each number is generated, and the method getRankOfNumber(int x) , which returns the number of values less than or equal to X (not including x itself).

EXAMPLE: Stream(in order of appearance): 5, 1, 4, 4, 5, 9, 7, 13, 3

getRankOfNumber(1) = 0 getRankOfNumber(3) = 1 getRankOfNumber(4) = 3

The suggested solution uses a modified Binary Search Tree, where each node stores stores the number of nodes to the left of that node. The time complexity for both methods is is O(logN) for balanced tree and O(N) for unbalanced tree, where N is the number of nodes.

But how can we construct a balanced BST from a stream of random integers? Won't the tree become unbalanced in due time if we keep on adding to the same tree and the root is not the median? Shouldn't the worst case complexity be O(N) for this solution (in which case a HashMap with O(1) and O(N) for track() and getRankOfNumber() respectively would be better)?


Solution

  • you just need to build an AVL or Red-Black Tree to have the O(lg n) complexities you desire.

    about the rank, its kind of simple. Let's call count(T) the number of elements of a tree with root T.

    • the rank of a node N will be:
      • firstly there will be count(N's left subtree) nodes before N (elements smaller than N)
      • let A = N's father. If N is right son of A, then there will be 1 + count(A's left subtree) nodes before N
      • if A is right son of some B, then there will be 1 + count(B's left subtree) nodes before N
      • recursively, run all the way up until you reach the root or until the node you are in isn't someone's right son.

    as the height of a balanced tree is at most lg(n), this method will take O(lg n) to return you someone's rank ( O(lg n) to find + O(lg n) to run back and measure the rank ), but this taking in consideration that all nodes store the sizes of their left and right subtrees.

    hope that helps :)