Suppose I have a nested data structure that I want to traverse. This data structure contains nodes which in turn may offer their children via node.get_children_generator()
. Of course, these children are also of type node
and are evaluated in a lazy manner, i.e. enumerated by a generator. For simplicity let us assume if the the node
has no children, the function get_children_generator
simply returns an empty list/generator (so we don't have to check that it is empty manually).
In order to traverse this data structure of nested nodes, is it a good idea to simply chain all generators iteratively? That is creating chains of chains of chains and so forth? Or would this create too much overhead?
What I have in mind is something like the following:
import itertools as it
def traverse_nodes(start_node):
"""Traverses nodes in breadth first manner.
First returns the start node.
For simplicity we require that
there are no cycles in the data structure,
i.e. we are dealing with a simple tree.
"""
node_queue = iter([start_node])
while True:
try:
next_node = node_queue.next()
yield next_node
# Next get the children
child_gen = next_node.get_children_generator()
# The next code line is the one I am worried about
# is it a good idea to make a chain of chains?
node_queue = it.chain(node_queue, child_gen)
except StopIteration:
# There are no more nodes
break
Is the line node_queue = it.chain(node_queue, child_gen)
a good approach to the traversal? Is it a good idea make a chain of chains of chains, etc.?
Such that you have actually something you can execute, here is a rather silly dummy node
class. The generator is a bit useless, but assume that in a real world example evaluating the children is a bit costly and really requires a generator.
class Node(object):
"""Rather silly example of a nested node.
The children are actually stored in a list,
so the generator is actually not needed.
But simply assume that returning a children
requires a lazy evaluation.
"""
counter = 0 # Counter for node identification
def __init__(self):
self.children = [] # children list
self.node_number = Node.counter # identifies the node
Node.counter += 1
def __repr__(self):
return 'I am node #%d' % self.node_number
def get_children_generator(self):
"""Returns a generator over children"""
return (x for x in self.children)
So the following code snippet
node0 = Node()
node1 = Node()
node2 = Node()
node3 = Node()
node4 = Node()
node5 = Node()
node6 = Node()
node0.children = [node1, node2]
node1.children = [node6]
node2.children = [node3, node5]
node3.children = [node4]
for node in traverse_nodes(node0):
print(node)
prints
I am node #0
I am node #1
I am node #2
I am node #6
I am node #3
I am node #5
I am node #4
Chaining multiple chain
s results in a recursive functions call overhead proportional to the amount of chain
s chained together.
First of all, our pure python chain
implementation so that we won't lose stack info. The C implementation is here and you can see it does basically the same thing - calls the next()
on the underlying iterable.
from inspect import stack
def chain(it1, it2):
for collection in [it1, it2]:
try:
for el in collection:
yield el
except StopIteration:
pass
We care only about a 2-iterables version of chain
. We first consume the first iterable, then the other one.
class VerboseListIterator(object):
def __init__(self, collection, node):
self.collection = collection
self.node = node
self.idx = 0
def __iter__(self):
return self
def __next__(self):
print('Printing {}th child of "{}". Stack size: {}'.format(self.idx, self.node, len(stack())))
if self.idx >= len(self.collection):
raise StopIteration()
self.idx += 1
return self.collection[self.idx - 1]
This is our handy list iterator that will tell us how many stack frames are there when next element of wrapped list is returned.
class Node(object):
"""Rather silly example of a nested node.
The children are actually stored in a list,
so the generator is actually not needed.
But simply assume that returning a children
requires a lazy evaluation.
"""
counter = 0 # Counter for node identification
def __init__(self):
self.children = [] # children list
self.node_number = Node.counter # identifies the node
Node.counter += 1
def __repr__(self):
return 'I am node #%d' % self.node_number
def get_children_generator(self):
"""Returns a generator over children"""
return VerboseListIterator(self.children, self)
def traverse_nodes(start_node):
"""Traverses nodes in breadth first manner.
First returns the start node.
For simplicity we require that
there are no cycles in the data structure,
i.e. we are dealing with a simple tree.
"""
node_queue = iter([start_node])
while True:
try:
next_node = next(node_queue)
yield next_node
# Next get the children
child_gen = next_node.get_children_generator()
# The next code line is the one I am worried about
# is it a good idea to make a chain of chains?
node_queue = chain(node_queue, child_gen)
except StopIteration:
# There are no more nodes
break
These are your implementations in respect to Python version used (3.4).
nodes = [Node() for _ in range(10)]
nodes[0].children = nodes[1:6]
nodes[1].children = [nodes[6]]
nodes[2].children = [nodes[7]]
nodes[3].children = [nodes[8]]
nodes[4].children = [nodes[9]]
Nodes' graph initiaization. The root is connected to the first 5 nodes, these in turn are connected to i + 5
th node.
for node in traverse_nodes(nodes[0]):
print(node)
The result of this interation is as follows:
I am node #0
Printing 0th child of "I am node #0". Stack size: 4
I am node #1
Printing 1th child of "I am node #0". Stack size: 5
I am node #2
Printing 2th child of "I am node #0". Stack size: 6
I am node #3
Printing 3th child of "I am node #0". Stack size: 7
I am node #4
Printing 4th child of "I am node #0". Stack size: 8
I am node #5
Printing 5th child of "I am node #0". Stack size: 9
Printing 0th child of "I am node #1". Stack size: 8
I am node #6
Printing 1th child of "I am node #1". Stack size: 9
Printing 0th child of "I am node #2". Stack size: 8
I am node #7
Printing 1th child of "I am node #2". Stack size: 9
Printing 0th child of "I am node #3". Stack size: 8
I am node #8
Printing 1th child of "I am node #3". Stack size: 9
Printing 0th child of "I am node #4". Stack size: 8
I am node #9
Printing 1th child of "I am node #4". Stack size: 9
Printing 0th child of "I am node #5". Stack size: 8
Printing 0th child of "I am node #6". Stack size: 7
Printing 0th child of "I am node #7". Stack size: 6
Printing 0th child of "I am node #8". Stack size: 5
Printing 0th child of "I am node #9". Stack size: 4
As you can see, the closer we got to the end of the node0
's children list, the bigger was the stack. Why is that? Let's take closer look at each step - each chain
call is enumerated for clarification:
node_queue = [node0]
next(node_queue)
, yielded node0
. node_queue = chain1([node0], [node1, node2, node3, node4, node5])
.next(node_queue)
. The list [node0]
is consumed, and the second list is being consumed. node1
gets yielded, and node_queue = chain2(chain1([node0], [node1, ...]), [node6])
.next(node_queue)
propagates down to chain1
(from chain2
), and node2
is yielded. node_queue = chain3(chain2(chain1([node0], [...]), [node6]), [node7])
.The pattern continues on to when we are about to yield node5
:
next(chain5(chain4, [node9]))
|
V
next(chain4(chain3, [node8]))
|
V
next(chain3(chain2, [node7]))
|
V
next(chain2(chain1, [node6]))
|
V
next(chain1([node0], [node1, node2, node3, node4, node5]))
^
yield
Rather not. A single next(node_queue)
call can actually cause lots of recursive calls, proportional to the size of regular iterators queue in a BFS each, or in simple words - the maximum amount of children for a node in the graph.
Here's a gif showing the algorithm: http://i.imgur.com/hnPIVG4.gif