I'm working on exercise 14.2-4 of CLRS (Intro to Algorithms 3ed):
We wish to augment red-black trees with an operation RB-ENUMERATE(x, a, b) that outputs all the keys k such that a ≤ k ≤ b in a red-black tree rooted at x. Describe how to implement RB-ENUMERATE in Θ(m + lg n) time, where m is the number of keys that are output and n is the number of internal nodes in the tree. (Hint, you do not need to add new attributes to the red-black tree.)
I found an algorithm in a solution online that seems to do the job well, but I can't tell whether the complexity is really Θ(m + lg n).
RB-ENUMERATE(x, a, b)
T = red-black tree that x belongs in
nil = T.nil // sentinel NIL leaf node
if a <= x.key <= b
print(x)
if a <= x.key and x.left != nil
RB-ENUMERATE(x.left, a, b)
if x.key <= b and x.right != nil
RB-ENUMERATE(x.right, a, b)
Is this recursive algorithm really Θ(m + lg n) running time, or does this algorithm not satisfy that requirement? I see where the lg n comes from, but I'm worried about the case of m = Θ(lg n), but the running time being asymptotically more than lg n.
In particular, in each call of RB-ENUMERATE, there is either 2 recursive calls, which occurs if x falls in the interval, or 1 recursive call, which occurs if x does not fall in the interval. Hence, there are exactly m "instances" of RB-ENUMERATE which make 2 recursive calls, but the number that make 1 recursive call is unclear. What if all m of the "2-recursive" calls occur at the upper levels of the recursion tree, and they all propagate all the way down to the bottom of the red-black tree? In that case, would it not be Θ(m lg n) running time?
Even if a node is within the interval there can be 0, 1 or 2 recursive calls. A black node can have a red node on one side but a nil node on the other (and it doesn't matter which side is which). A red node will have two black children. which will match in being nil or not.
It takes up to lg(n) operations to figure out where to start printing, it then takes m operations to print the nodes of interest and then stops. The algorithm can actually do better than a strict lg(n) because it might not have to walk all the way down the tree before finding the prune point.