On Leetcode, Question #572 prompts the user to check whether a particular tree (given by its root) is a subtree of another tree (given by its root). Below is an implementation to solve this problem:
class Solution:
def isSubtree(self, root: Optional[TreeNode], subRoot: Optional[TreeNode]) -> bool:
def isSameTree(p, q):
if not p and not q:
return True
if not p or not q or p.val != q.val:
return False
return isSameTree(p.left, q.left) and isSameTree(p.right, q.right)
def recurse(root, subRoot):
if not root:
return False
elif isSameTree(root, subRoot):
return True
return recurse(root.right, subRoot) or recurse(root.left, subRoot)
return recurse(root, subRoot)
In effect, the recurse()
call will be called once for each node in the parent tree. During a given call, iSameTree()
will be called an attempt to "super-impose" the subtree onto the tree, starting at the current node. If it can successfully match all components of the subtree, then iSameTree()
will return True
. Otherwise, it will return False
.
While the implementation is a form of brute force, I am having a harder time discussing time and space complexity. For the intent of analysis, let us say that the size of the potential parent tree in nodes is given by N
whereas the size of the potential subtree in nodes is given by M
.
When it comes to time complexity, we could quickly conclude that there will be N
calls to recurse()
, with each having the potential to generate a maximum of M
calls to isSameTree()
. Therefore, at worst, there will be N*M
total calls. Apart from the recursive component, each function call takes constant time to perform, so we can say that this algorithm is worst-case O(NM). While this makes sense, I feel as though there may be a tighter bound. In other words, for an arbitrary N
node parent tree and M
node subtree, it will be impossible for every one of the N
recurse()
calls to lead to M
IsSubtree()
calls; for instance, leaf nodes will naturally terminate after the first IsSameTree()
call; so, if M
is not size 1, then the recurse()
calls associated with leaf nodes will result in fewer than M calls in every other scenario. I say all of this to ask: Is there a way to find a tighter bound?
When it comes to auxiliary space complexity, it is perhaps even more complicated. Some folks are quick to assert that the space complexity is O(LogN + LogM)
, but, again, this does not make sense to me. Moreover, at any given time, we can have a maximum of LogN
recurse()
calls on the call stack. At any given time, we can also have a maximum of LogM
isSameTree()
calls on the call stack. While these two statements are true, they are never simultaneously true for an arbitrary N
and M
. That is, to achieve LogM
isSameTree()
calls on the call stack, the most recent recurse()
call needs to be on a node that is at least LogM
levels from the leaf nodes in the parent tree; therefore, the deepest level that the node can be is LogN - LogM
. It follows that there can only be LogN - LogM
recurse()
calls and LogM
isSameTree
calls on the call stack at a given time; in total, that is LogN
calls. On the flip side, for there to be LogN
recurse()
calls on the call stack, the most recent call to recurse()
needs to be on a leaf node in the parent tree. In such a case, this recurse()
call can only lead to, at most, 2 IsSubtree()
additional calls on the call stack. Based on this logic, then, the auxiliary space is simply O(LogN)
. I say all of this to ask: is auxiliary space O(LogN)
or O(LogN + LogM)
?
The analysis is easier than you think, because the worst case inputs (for both time and space) are completely unbalanced trees, i.e., linked lists, with all values the same.
Worst case time complexity is Θ(NM). When M=N/2, for example, you will do N/2 comparisons of size M, and NM/2 is in Ω(NM).
Worst case space complexity is Θ(N). You'll never recurse deeper than depth N, and you will get to depth N no matter what M is.
You can make a simple linear time algorithm by only comparing the subtree-to-find with subtrees of the same height.