I'm beginning to start data structures and algorithms in Python and I am on Binary trees right now. However I'm really confused on how to implement them. I see some implementations where there are two classes one for Node and one for the tree itself like so:
class Node:
def __init__(self, data):
self.right = None
self.left = None
self.data = data
class Tree:
def __init__(self):
self.root = None
def insert()
def preorder()
.
.
.
However I also see implementations where there is no Node class and instead all the code goes inside the Tree class without the following
def __init__(self):
self.root = None
Can someone please help me understand why there are two different ways, the pros and cons of each method, the differences between them and which method I should follow to implement a binary tree.
Thank you!
Yes, there are these two ways.
First of all, in the 1-class approach, that class is really the Node
class (possibly named differently) of the 2-class approach. True, all the methods are then on the Node
class, but that could also have been the case in the 2-class approach, where the Tree
class would defer much of the work by calling methods on the Node
class. So what the 1-class approach is really missing, is the Tree
class. The name of the class can obscure this observation, but it is better to see it that way, even when the methods to work with the tree are on the Node
class.
The difference becomes apparent when you need to represent an empty tree. With the 1-class approach you would have to set your main variable to None
. But that None
is a different concept than a tree without nodes. Because on an object that represents an empty tree one can still call methods to add nodes again. You cannot call methods on None
. It also means that the caller needs to be aware when to change its own variable to None
or to a Node instance, depending on whether the tree happened to be (or become) empty.
With the 2-class system this management is taken out of the hands of the caller, and managed inside the Tree
instance. The caller will always use the single reference to that instance, independent on whether the tree is empty or not.
1-class approach:
root = None
2-class approach:
tree = Tree()
1-class approach
# Cannot use an insert method, as there is no instance yet
root = new Node(1) # Must get the value for the root
2-class approach
tree.insert(1) # No need to be aware of a changing root
As the 1-class approach needs to get the value of the root in this case, you often see that with this approach, the root is always captured like that, even when the tree was not empty. In that case the value of the caller's root
variable will not really change.
root = root.delete(1) # Must get the new value for the root; could be None!
2-class approach
tree.delete(1) # No need to be aware of a changing root
Even though the deletion might in general not lead to an empty tree, in the 1-class approach the caller must take that possibility into account and always assign to its own root reference, as it might become None
.
None
valueSometimes you see approaches where an empty tree is represented by a Node
instance that has a None
-value, so to indicate this is not really data. Then when the first value is added to the tree, that node's value is updated to that value. In this way, the caller does no longer have to manage the root: it will always be a reference to the same Node
instance, which sometimes can have a None
value so to indicate the tree is actually empty.
The code for the caller to work with this 1-class variant, would become the same as the 2-class approach.
This is bad practice. In some cases you may want a tree to be able to have a None
-value as data, and then you cannot make a tree with such a value, as it will be mistaken for a dummy node that represents an empty tree
This is a variant to the above. Here the dummy node is always there. The empty tree is represented by a single node instance. When the first value is inserted into the tree, this dummy node's value is never updated, but the insertion happens to its left
member. So the dummy node never plays a role for holding data, but is just the container that maintains the real root of the tree in its left
member. This really means that the dummy node is the parallel of what the Tree
instance is in the 2-class approach. Instead of having a proper root
member, it has a left
member for that role, and it has no use for its right
member.
The 2-class approach is more like how native data types work. For instance, take the standard list
in Python. An empty list is represented by []
, not by None
. There is a clear distinction between []
and None
. Once you have the list reference (like lst = []
), that reference never changes. You an append and delete, but you never have to assign to lst
itself.