Search code examples
pythonbinary-search-tree

Why does BST node not get removed in the case it is the root with a single child?


I am working on this BST (binary search tree) code challenge:

Write a BST class for a Binary Search Tree. The class should support:

  • Inserting values with the insert method.
  • Removing values with the remove method; this method should only remove the first instance of a given value.
  • Searching for values with the contains method.

Note that you can't remove values from a single-node tree. In other words, calling the remove method on a single-node tree should simply do nothing.

Each BST node has an integer value, a left child node, a right child node. A node is said to be a valid BST node if and only if it satisfies the BST property; its value is strictly greater than the values of every node to its left; its value is less than or equal to the values of every node to its right; and its children nodes are either valid BST nodes themselves or None / null.

Sample Usage

   // Assume the following BST has already been created:

            10
          /     \
         5      15          
       /   \   /   \ 
      2     5 13   22
    /           \
   1            14

And I am currently implementing the remove method. This is my code:

class BST:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def insert(self, value):
        if self is None:
            self = BST(value)
        if value < self.value:
            if self.left is not None:
                self.left.insert(value)
            else: 
                self.left = BST(value)
        if value >= self.value:
            if self.right is not None:
                self.right.insert(value)
            else: 
                self.right = BST(value)
        return self

    def contains(self, value):
        if self == None:
            return False
        if self.value == value:
            return True
        if value < self.value:
            if self.left is not None:
                return self.left.contains(value)
        if value > self.value:
            if self.right is not None:
                return self.right.contains(value)
        return False

   def remove(self, value, parent=None):
        if self is None:
            return self
        elif value < self.value:
            if self.left is not None:
                self.left.remove(value,self)
        elif value > self.value:
            if self.right is not None:
                self.right.remove(value,self)
        elif value == self.value:
            if self.right is None and self.left is None:
                if parent is None:
                    pass
                else:
                    if parent.left == self:
                        parent.left = None
                    if parent.right == self:
                        parent.right = None
            elif self.right is None:
                if parent is None:
                    self = self.left
                else:
                    if parent.left == self:
                        parent.left = self.left
                    if parent.right == self:
                        parent.right = self.left
            elif self.left is None:
                if parent is None:
                    self = self.right
                else:
                    if parent.left == self:
                        parent.left = self.right
                    if parent.right == self:
                        parent.right = self.right
            elif self.right is not None and self.left is not None:
                sub = self.right.minguy()
                self.value = sub
                self.right.remove(sub,self)
                pass
        return self

    def minguy(self):
        while (self.left is not None):
            self = self.left
        return self.value

The problem

Consider this degenerate tree:

  1
   \
    2
     \
      3
       \
        4

...and we have to remove 1. Why does simply self = self.right not do the removing job?

I was expecting 2 to be the root and 1 to not exist and the tree to end up like:

    2
     \
      3
       \
        4

Specifically this code:

        elif self.left is None:
            if parent is None:
                self = self.right
            else:
                if parent.left == self:
                    parent.left = self.right
                if parent.right == self:
                    parent.right = self.right

Here is the code that tests my implementation:

import program
import unittest

BST = program.BST

class TestProgram(unittest.TestCase):
    def test_case_1(self):
        root = BST(1)
        root.insert(2)
        root.insert(3)
        root.insert(4)
        root.remove(1)

        self.assertTrue(not root.contains(1))
        self.assertTrue(root.value == 2)   # This test fails!
        self.assertTrue(root.contains(3))

Solution

  • Why does simply self = self.right not do the removing job?

    That is indeed the statement that is at the core of the problem. self is a local name, and thus self = only assigns something to that local name. Assignments to names can never mutate your tree. To mutate a tree, you need assignments to attributes, not names.

    Quick solution

    In the special case where you need to remove the root node (and don't have a parent node whose left or right attribute to alter), and where that root has exactly one child, you must realise that you cannot change what the caller's root name refers to. Whatever you do, the caller's root will still reference the original root node. So you have to update that root node's attributes.

    There are at least two ways to do that:

    • Either you apply the minguy logic also when the node has only a right child, and you do the "mirrored" action (i.e. implement maxguy) for when the node only has a left child, or...

    • overwrite all the root's attributes (left, right and value) so it becomes a copy of its only child. That child thus becomes disconnected and is therefore deleted instead of the root node.

    Here is what you need to change to implement the second idea:

    First add a helper method to get a copy from another node:

        def copy_from(self, other):
            self.value = other.value
            self.left = other.left
            self.right = other.right
    

    Then replace this:

    self = self.left
    

    ...with:

    self.copy_from(self.left)
    

    And replace:

    self = self.right
    

    ...with:

    self.copy_from(self.right)
    

    This will fix your issue.

    Remarks

    • You have code that tests whether self is None, but this will never happen when your methods are called as methods. I suppose that the testing suite will not call these functions in another way.

    • elif self.right is not None and self.left is not None: -- this is the only case that is left over after all previous conditions, so this could be just an else:

    • When dealing with nodes it is not necessary to explicitly compare with None. You can just do if self.left: and if not self.left:, ...etc.

    • If you allow remove to return a node, then you can do it without the parent parameter, simplifying the code:

          def remove(self, value):
              if value < self.value:
                  if self.left:
                      self.left = self.left.remove(value)
              elif value > self.value:
                  if self.right:
                      self.right = self.right.remove(value)
              elif value == self.value:
                  if not self.right and not self.left:
                      return None
                  elif not self.right or not self.left:
                      self.copy_from(self.left or self.right)
                  else:
                      self.value = self.right.minguy()
                      self.right = self.right.remove(self.value)
              return self
      
    • Concerning the code challenge, I must say this is an inferior challenge. It says that "you can't remove values from a single-node tree". That is of course nonsense, and is inspired by a limited vision on how to implement trees. They force you into a certain implementation where a tree must be initialised with one value immediately, and this is because they make no difference between the concept of a node and the concept of a tree. Yet, there is a difference. The notion of an empty tree exists and there is really no good reason why we should not be able to represent an empty tree, or create one for that matter.

      The better way to implement a tree is to distinguish the concept of a node and a tree. Both deserve a class. But it wouldn't work with the kind of tests they have prepared, and so they enforce an inferior design.