Search code examples
pythonpass-by-referencepass-by-value

Some Issues in Tree Iteration in Python while the Tree Can be Dynamically Generated on the Fly


I am writing the following functionality. A root nodes is given with 2 set of indices (_indices_lhs and _indices_rhs), and fore every index that apperas in both the left hand side and right hand side it creates a children, where

For example ik, kn will have one children where the _indices_lhs and _indices_rhs of the children will be respectively i and n. In the case of for example ikn, ikn (Identical indices on the left and right hand side) the expected outcome is that you would have 3 children kn, kn; in, in; ik, ik and ofc these 3 children would have each 2 more children. To implement this I iterate the tree with a while loop. I would create the root node and then keep a queue of references of created children, and I would then iterate as long as this queue is not empty to create further children/nodes that might create nodes. And when I iterate through all of the queue and when it is empty, then I can know that no more children can be generated. The code I have implemented for this purpose is here:

The node looks as following, children are created in the member function generate_further_paths. I create all the possible children combinations and add them to _children member variable.

class Node:
    _parent = None
    _depth = 0
    _indices_lhs = list()
    _indices_rhs = list()
    _children_contractions = list()
    _children              = list()

    def __init__(self, indices_lhs: list[str], indices_rhs: list[str], depth: int = 0):
        self._indices_lhs = indices_lhs
        self._indices_rhs = indices_rhs
        self._depth = depth
        print(self)

    def generate_further_paths(self):
        if len(self._indices_lhs) == 0 or len(self._indices_rhs) == 0:
            return

        if len(self._indices_lhs) == 1 and len(self._indices_rhs) == 1 \
            and self._indices_lhs[0] == self._indices_rhs[0]:
            return

        print("D1: ", self._depth, ": ", self._indices_lhs, " | ", self._indices_rhs)
        intersected_indices = []
        for i in self._indices_lhs:
            if i in self._indices_rhs:
                intersected_indices.append(i)

        if len(intersected_indices) == 0:
            return

        print("Intersected indices: ", intersected_indices)
        for ii in intersected_indices:
            child_lhs = copy.deepcopy(self._indices_lhs)
            child_lhs.remove(ii)
            child_rhs = copy.deepcopy(self._indices_rhs)
            child_rhs.remove(ii)
            child = Node(child_lhs, child_rhs, self._depth + 1)
            self._children_contractions.append(ii)
            self._children.append(copy.deepcopy(child))

    def __str__(self):
        return f"Node constructed depth {self._depth}:" + "".join(self._indices_lhs) + " * " + \
            "".join(self._indices_rhs) + ", " + str(len(self._children)) + " children"

    def __repr__(self):
        return '<tree node representation>'

And the loop-approach to generate the tree is as follows. I am trying to ensure that the queue elements are references to the originals:

free_lhs = ["i", "k", "n"]
free_rhs = ["i", "k", "n"]
beginNode = Node(free_lhs, free_rhs, 0)
beginNode.generate_further_paths()
queue = copy.copy(beginNode._children)
while len(queue) != 0:
    nq = list()
    for child in queue:
        child.generate_further_paths()
        nnq = copy.copy(child._children)
        nq += nnq
    queue = nq

I think there is something I understand that there is something wrong happening as in this loop I constantly get more elements, and some nodes in the tree increasing the number of their children. An example output:

D1:  1 :  ['i', 'k']  |  ['i', 'k']
Intersected indices:  ['i', 'k']
Node constructed depth 2:k * k, 4333 children
Node constructed depth 2:i * i, 4334 children

But nodes constructed at depth 2 are never iterated for the generation, but their children are kept pushing back to their member variable _children. I come more from C++ area, and with references, I think this should not have happened so I think something here does not capture by reference.

Since the names are bound to the real objects, I was expecting this approach to not be a problem. I iterate every children overwrite the previous queue with the list of generated children. I need some further insight on have my code behaves, as I have read through this post. and some other sources but I cant grasp it much.


Solution

  • I want to post @slothrop's comment as answer as it solves the problem.

    self._children = list() creates a class variable which is shared upon every instance of the class. Member variables need to be initialized in the __init__ method of the class.