Search code examples
pythondata-structuresbig-ospace-complexity

Space Complexity of a Recursion Function Involving Additional Self-Contained Auxiliary Space


I apologize for the poor wording of the question, but it seems that there is no other way to phrase it. Suppose we have the following code, which is just toy code to highlight my concern:

def dfs(root, seen):
  if not root:
    return seen
  
  seen.add(root)
  new_set = set(seen)

  dfs(root.left, new_set)
  dfs(root.right, new_set)

My question is regarding auxiliary space complexity. Typically, when dealing with recursion, the only auxiliary space complexity that needs to be considered is the space occupied on the call stack. However, in this case, we also see that within each function call, a new set is instantiated. In the worst case, this new set will have H elements, where H is the height of the tree. In this way, there will at worst be H function calls on the call stack, each of these calls containing a set of at worst H elements. Therefore, would the auxiliary space complexity be O(H^2)?


Solution

  • Yes, the auxiliary space would be O(ℎ²).

    First of all, I think the intent of your code is that for a given root node, both recursive calls (going left versus going right) get a new_set that has the same members. But this is not true because these recursive calls will add another member to it before it gets cloned. And so the second of these recursive calls will now have two more members than seen had at the top of this function call. I'll first assume this effect was not intended, so modify the code to first clone and then add:

    def dfs(root, seen):
      if not root:
        return seen
      
      new_set = set(seen)
      new_set.add(root)
    
      dfs(root.left, new_set)
      dfs(root.right, new_set)
    

    With this adapted code, there will at a certain moment be ℎ sets that are in use -- one for each distinct recursion depth -- with sizes 1, 2, 3, ..., ℎ, which is a triangular number, and is indeed O(ℎ²).

    Not your question, but this can be reduced to O(ℎ) if you keep using the same set, like so:

    def dfs(root, seen):
      if not root:
        return seen
      
      seen.add(root)
    
      dfs(root.left, seen)
      dfs(root.right, seen)
    
      seen.remove(root)
    

    But if the original code you had was intended to work like that, then the sets might grow with steps of 2, giving set sizes of 1, 3, 5, ..., 2ℎ+1, which is still O(ℎ²).