Binary Tree/Search Tree

I'm beginning to start data structures and algorithms in Python and I am on Binary trees right now. However I'm really confused on how to implement them. I see some implementations where there are two classes one for Node and one for the tree itself like so:

class Node:
    def __init__(self, data):
    self.right = None
    self.left = None
    self.data = data 

class Tree: 
    def __init__(self):
    self.root = None
    
    def insert()
    def preorder()
    .
    .
    .

However I also see implementations where there is no Node class and instead all the code goes inside the Tree class without the following

def __init__(self):
    self.root = None

Can someone please help me understand why there are two different ways, the pros and cons of each method, the differences between them and which method I should follow to implement a binary tree.

Thank you!

Solution

Yes, there are these two ways.

First of all, in the 1-class approach, that class is really the Node class (possibly named differently) of the 2-class approach. True, all the methods are then on the Node class, but that could also have been the case in the 2-class approach, where the Tree class would defer much of the work by calling methods on the Node class. So what the 1-class approach is really missing, is the Tree class. The name of the class can obscure this observation, but it is better to see it that way, even when the methods to work with the tree are on the Node class.

The difference becomes apparent when you need to represent an empty tree. With the 1-class approach you would have to set your main variable to None. But that None is a different concept than a tree without nodes. Because on an object that represents an empty tree one can still call methods to add nodes again. You cannot call methods on None. It also means that the caller needs to be aware when to change its own variable to None or to a Node instance, depending on whether the tree happened to be (or become) empty.

With the 2-class system this management is taken out of the hands of the caller, and managed inside the Tree instance. The caller will always use the single reference to that instance, independent on whether the tree is empty or not.

Examples

1. Creating an empty tree

1-class approach:

root = None

2-class approach:

tree = Tree()

2. Adding a first value to an empty tree

1-class approach

# Cannot use an insert method, as there is no instance yet
root = new Node(1)  # Must get the value for the root

2-class approach

tree.insert(1)  # No need to be aware of a changing root

As the 1-class approach needs to get the value of the root in this case, you often see that with this approach, the root is always captured like that, even when the tree was not empty. In that case the value of the caller's root variable will not really change.

3. Deleting the last value from a tree

root = root.delete(1)  # Must get the new value for the root; could be None!

2-class approach

tree.delete(1)  # No need to be aware of a changing root

Even though the deletion might in general not lead to an empty tree, in the 1-class approach the caller must take that possibility into account and always assign to its own root reference, as it might become None.

Variants for 1-class systems

1. A `None` value

Sometimes you see approaches where an empty tree is represented by a Node instance that has a None-value, so to indicate this is not really data. Then when the first value is added to the tree, that node's value is updated to that value. In this way, the caller does no longer have to manage the root: it will always be a reference to the same Node instance, which sometimes can have a None value so to indicate the tree is actually empty.

The code for the caller to work with this 1-class variant, would become the same as the 2-class approach.

This is bad practice. In some cases you may want a tree to be able to have a None-value as data, and then you cannot make a tree with such a value, as it will be mistaken for a dummy node that represents an empty tree

2. Sentinel node

This is a variant to the above. Here the dummy node is always there. The empty tree is represented by a single node instance. When the first value is inserted into the tree, this dummy node's value is never updated, but the insertion happens to its left member. So the dummy node never plays a role for holding data, but is just the container that maintains the real root of the tree in its left member. This really means that the dummy node is the parallel of what the Tree instance is in the 2-class approach. Instead of having a proper root member, it has a left member for that role, and it has no use for its right member.

How native data structures are implemented

The 2-class approach is more like how native data types work. For instance, take the standard list in Python. An empty list is represented by [], not by None. There is a clear distinction between [] and None. Once you have the list reference (like lst = []), that reference never changes. You an append and delete, but you never have to assign to lst itself.