algorithm binary-search-tree tree-traversal range-query

Find all subtrees in a BST whose keys lie in a given range

I was given this question during a recent interview: Given a BST whose nodes contains an Integer as value, find all subtrees whose nodes fall between integers X (min) and Y (max), where X<Y. These subtrees cannot overlap each other.

I have solved variations of this problem, example - print keys of a BST that fall in a given range. But couldn't figure this one out, since it involves finding all connected sub-graphs of the main graph/tree that satisfy very specific constraints. Any pointers/help/pseudo code is appreciated.

Added notes -

The problem defined the datastructure of node as having a left pointer, right pointer, and a integer value. There was no way to mark a node.
Was asked to solve this in Java.
When i said subtree/subgraph, i meant a connected set of nodes, not a list of disjoint nodes. sorry for the confusion.

Solution

The concrete solution depends on the definition of a subtree. Consider the following BST:

And we want to find the subtrees in the range [4,8]. It is obvious that the 4 node belongs to the output. But what about the other half tree? If a subtree refers to a node with all of its children, then that's the entire result. If a subtree is actually a subset of the input nodes, the nodes 5 and 8 belong to the result but their connections to the 3 and 9 nodes have to be stripped away.

In any case, the following algorithm can handle both. The preprocessor define WHOLE_SUBTREES defines whether subtrees are entire subcomponents with all children.

static List<BSTNode> FindSubtreesInRange(BSTNode root, int rangeMin, int rangeMax)
{
    var result = new List<BSTNode>();
    if (IsTreeWithinRange(root, rangeMin, rangeMax, int.MinValue, int.MaxValue, result))
        result.Add(root);
    return result;
}

static bool IsTreeWithinRange(BSTNode root, int rangeMin, int rangeMax, int treeRangeMin, int treeRangeMax, List<BSTNode> resultList)
{
    if (treeRangeMin >= rangeMin && treeRangeMax <= rangeMax)
        return true;
    if ( treeRangeMin > rangeMax || treeRangeMax < rangeMin)
        return false;

    if (root.Key < rangeMin)
    {
        if (root.Right != null && IsTreeWithinRange(root.Right, rangeMin, rangeMax, root.Key + 1, treeRangeMax, resultList))
            resultList.Add(root.Right);
        return false;
    }

    if (root.Key > rangeMax)
    {
        if (root.Left != null && IsTreeWithinRange(root.Left, rangeMin, rangeMax, treeRangeMin, root.Key, resultList))
            resultList.Add(root.Left);
        return false;
    }

    if (root.Left == null && root.Right == null)
        return true;

    if (root.Left == null)
    {
#if WHOLE_SUBTREES
        if (!IsTreeWithinRange(root.Right, rangeMin, rangeMax, root.Key + 1, treeRangeMax, resultList))
            root.Right = null;
        return true;
#else
        return IsTreeWithinRange(root.Right, rangeMin, rangeMax, root.Key + 1, treeRangeMax, resultList);
#endif
    }

    if (root.Right == null)
    {
#if WHOLE_SUBTREES
        if (!IsTreeWithinRange(root.Left, rangeMin, rangeMax, treeRangeMin, root.Key, resultList))
            root.Left = null;
        return true;
#else
        return IsTreeWithinRange(root.Left, rangeMin, rangeMax, treeRangeMin, root.Key, resultList);
#endif
    }

    var leftInRange = IsTreeWithinRange(root.Left, rangeMin, rangeMax, treeRangeMin, root.Key, resultList);
    var rightInRange = IsTreeWithinRange(root.Right, rangeMin, rangeMax, root.Key + 1, treeRangeMax, resultList);

    if (leftInRange && rightInRange)
        return true;

#if WHOLE_SUBTREES
    if (!leftInRange)
        root.Left = null;
    if (!rightInRange)
        root.Right = null;
    return true;   
#else
    if (leftInRange)
        resultList.Add(root.Left);
    if (rightInRange)
        resultList.Add(root.Right);
    return false;
#endif

}

The idea is as follows: If only one subtree of a given node lies within the given range, then this must be the root of a new subtree. If both lie in the range, then they are not the root of a subtree. Instead, the parent level should handle the according decision.

The algorithm starts with the following: We traverse the tree and remember in which ranges the keys may be (treeRangeMin/Max). This allows a fast check if an entire subtree lies in the given range (first statement of the IsTreeWithinRange method.

The next two statements handle the case if the current node's key lies outside the given range. Then only one of it's subtrees might be within the range. If that's the case, this subtree is added to the result list.

Next, we check if the subtrees exist. If both do not, then the current tree is completely contained within the range.

If only one subtree exists, then the action differs based on whether we may split trees. If we may split the tree, the following happens: If the subtree is not within the range, we cut it off and return true (because the current node is contained within the given range). If we may not split trees, we just propagate the result of the recursive call.

Lastly, if both children exist. If one of them is not contained within the range, we cut it off (if we are allowed to). If we are not allowed, we add the subtree to the result list that lies within the given range.