Chaining of Python generators to traverse a nested data structure; best practices

Suppose I have a nested data structure that I want to traverse. This data structure contains nodes which in turn may offer their children via node.get_children_generator(). Of course, these children are also of type node and are evaluated in a lazy manner, i.e. enumerated by a generator. For simplicity let us assume if the the node has no children, the function get_children_generator simply returns an empty list/generator (so we don't have to check that it is empty manually).

In order to traverse this data structure of nested nodes, is it a good idea to simply chain all generators iteratively? That is creating chains of chains of chains and so forth? Or would this create too much overhead?

What I have in mind is something like the following:

import itertools as it

def traverse_nodes(start_node):
    """Traverses nodes in breadth first manner.
    
    First returns the start node.
    
    For simplicity we require that 
    there are no cycles in the data structure,
    i.e. we are dealing with a simple tree.
    
    """
    node_queue = iter([start_node])
    while True:
        try:
            next_node = node_queue.next()
            yield next_node
            
            # Next get the children
            child_gen = next_node.get_children_generator()
            
            # The next code line is the one I am worried about
            # is it a good idea to make a chain of chains?
            node_queue = it.chain(node_queue, child_gen)
        except StopIteration:
            # There are no more nodes
            break

Is the line node_queue = it.chain(node_queue, child_gen) a good approach to the traversal? Is it a good idea make a chain of chains of chains, etc.?

Such that you have actually something you can execute, here is a rather silly dummy node class. The generator is a bit useless, but assume that in a real world example evaluating the children is a bit costly and really requires a generator.

class Node(object):
    """Rather silly example of a nested node.

    The children are actually stored in a list,
    so the generator is actually not needed.
    
    But simply assume that returning a children
    requires a lazy evaluation.
    
    """
    counter = 0 # Counter for node identification
    
    def __init__(self):
        self.children = [] # children list
        self.node_number = Node.counter # identifies the node
        Node.counter += 1
    
    def __repr__(self):
        return 'I am node #%d' % self.node_number
    
    def get_children_generator(self):
        """Returns a generator over children"""
        return (x for x in self.children)

So the following code snippet

node0 = Node()
node1 = Node()
node2 = Node()
node3 = Node()
node4 = Node()
node5 = Node()
node6 = Node()
node0.children = [node1, node2]
node1.children = [node6]
node2.children = [node3, node5]
node3.children = [node4]

for node in traverse_nodes(node0):
    print(node)

prints

I am node #0

I am node #1

I am node #2

I am node #6

I am node #3

I am node #5

I am node #4

Solution

Chaining multiple chains results in a recursive functions call overhead proportional to the amount of chains chained together.

First of all, our pure python chain implementation so that we won't lose stack info. The C implementation is here and you can see it does basically the same thing - calls the next() on the underlying iterable.

from inspect import stack


def chain(it1, it2):
    for collection in [it1, it2]:
        try:
            for el in collection:
                yield el

        except StopIteration:
            pass

We care only about a 2-iterables version of chain. We first consume the first iterable, then the other one.

class VerboseListIterator(object):
    def __init__(self, collection, node):
        self.collection = collection
        self.node = node
        self.idx = 0

    def __iter__(self):
        return self

    def __next__(self):
        print('Printing {}th child of "{}". Stack size: {}'.format(self.idx, self.node, len(stack())))
        if self.idx >= len(self.collection):
            raise StopIteration()

        self.idx += 1
        return self.collection[self.idx - 1]

This is our handy list iterator that will tell us how many stack frames are there when next element of wrapped list is returned.

class Node(object):
    """Rather silly example of a nested node.

    The children are actually stored in a list,
    so the generator is actually not needed.

    But simply assume that returning a children
    requires a lazy evaluation.

    """
    counter = 0 # Counter for node identification

    def __init__(self):
        self.children = [] # children list
        self.node_number = Node.counter # identifies the node
        Node.counter += 1

    def __repr__(self):
        return 'I am node #%d' % self.node_number

    def get_children_generator(self):
        """Returns a generator over children"""
        return VerboseListIterator(self.children, self)


def traverse_nodes(start_node):
    """Traverses nodes in breadth first manner.

    First returns the start node.

    For simplicity we require that
    there are no cycles in the data structure,
    i.e. we are dealing with a simple tree.

    """
    node_queue = iter([start_node])
    while True:
        try:
            next_node = next(node_queue)
            yield next_node

            # Next get the children
            child_gen = next_node.get_children_generator()

            # The next code line is the one I am worried about
            # is it a good idea to make a chain of chains?
            node_queue = chain(node_queue, child_gen)
        except StopIteration:
            # There are no more nodes
            break

These are your implementations in respect to Python version used (3.4).

nodes = [Node() for _ in range(10)]
nodes[0].children = nodes[1:6]
nodes[1].children = [nodes[6]]
nodes[2].children = [nodes[7]]
nodes[3].children = [nodes[8]]
nodes[4].children = [nodes[9]]

Nodes' graph initiaization. The root is connected to the first 5 nodes, these in turn are connected to i + 5th node.

for node in traverse_nodes(nodes[0]):
    print(node)

The result of this interation is as follows:

I am node #0
Printing 0th child of "I am node #0". Stack size: 4
I am node #1
Printing 1th child of "I am node #0". Stack size: 5
I am node #2
Printing 2th child of "I am node #0". Stack size: 6
I am node #3
Printing 3th child of "I am node #0". Stack size: 7
I am node #4
Printing 4th child of "I am node #0". Stack size: 8
I am node #5
Printing 5th child of "I am node #0". Stack size: 9
Printing 0th child of "I am node #1". Stack size: 8
I am node #6
Printing 1th child of "I am node #1". Stack size: 9
Printing 0th child of "I am node #2". Stack size: 8
I am node #7
Printing 1th child of "I am node #2". Stack size: 9
Printing 0th child of "I am node #3". Stack size: 8
I am node #8
Printing 1th child of "I am node #3". Stack size: 9
Printing 0th child of "I am node #4". Stack size: 8
I am node #9
Printing 1th child of "I am node #4". Stack size: 9
Printing 0th child of "I am node #5". Stack size: 8
Printing 0th child of "I am node #6". Stack size: 7
Printing 0th child of "I am node #7". Stack size: 6
Printing 0th child of "I am node #8". Stack size: 5
Printing 0th child of "I am node #9". Stack size: 4

As you can see, the closer we got to the end of the node0's children list, the bigger was the stack. Why is that? Let's take closer look at each step - each chain call is enumerated for clarification:

node_queue = [node0]
Called next(node_queue), yielded node0. node_queue = chain1([node0], [node1, node2, node3, node4, node5]).
Called next(node_queue). The list [node0] is consumed, and the second list is being consumed. node1 gets yielded, and node_queue = chain2(chain1([node0], [node1, ...]), [node6]).
Calling next(node_queue) propagates down to chain1 (from chain2), and node2 is yielded. node_queue = chain3(chain2(chain1([node0], [...]), [node6]), [node7]).

The pattern continues on to when we are about to yield node5:

next(chain5(chain4, [node9]))
              |
              V
    next(chain4(chain3, [node8]))
                  |
                  V
        next(chain3(chain2, [node7]))
                      |
                      V
            next(chain2(chain1, [node6]))
                          |
                          V
                next(chain1([node0], [node1, node2, node3, node4, node5]))
                                                               ^
                                                             yield

Is it faster than simple BFS/DFS?

Rather not. A single next(node_queue) call can actually cause lots of recursive calls, proportional to the size of regular iterators queue in a BFS each, or in simple words - the maximum amount of children for a node in the graph.

tl; dr

Here's a gif showing the algorithm: http://i.imgur.com/hnPIVG4.gif