Search code examples
pythonscipytreenetworkxdepth-first-search

Predecessors from scipy depth_first_order


I use scipy version 1.14.1 to traverse the minimum spanning tree in depth-first order, but I do not understand some results, namely the predecessors returned by scipy are not correct.

Here is an illustration for the following graph:

enter image description here

The following code

import numpy as np
from scipy.sparse import coo_matrix
from scipy.sparse.csgraph import minimum_spanning_tree
from scipy.sparse.csgraph import depth_first_order

rows = np.array([0, 1, 2, 2, 4, 9, 2,  2, 10, 10, 8 ])
cols = np.array([1, 2, 3, 4, 9, 5, 6, 10, 11,  8, 7 ])

# construct undirected graph
X = coo_matrix( (12,12))
X.col = np.concatenate( (rows, cols), axis=0)
X.row = np.concatenate( (cols, rows), axis=0)
X.data = np.ones(len(X.row))

# the minimum spanning tree is the graph itself
tree = minimum_spanning_tree(X)
print(tree)

# traversing the graph
print(depth_first_order(tree, i_start=0, directed=False, return_predecessors=True))

gives the minimum spanning tree (the graph itself in fact):

  Coords    Values
  (0, 1)    1.0
  (1, 2)    1.0
  (2, 3)    1.0
  (2, 4)    1.0
  (2, 6)    1.0
  (2, 10)   1.0
  (4, 9)    1.0
  (5, 9)    1.0
  (7, 8)    1.0
  (8, 10)   1.0
  (10, 11)  1.0

and the depth-first order:

[ 0, 1, 2, 3, 4, 9, 5, 6, 10, 11, 8, 7]

predecessors: [-9999, 0, 1, 2, 2, 9, 2, 8,10, 4, 2, 10]

So it says that 9 has 9 as ancestor, but it is 4, and from that position on results are not coherent.

Thanks for any help.


Solution

  • From the documentation, the function depth_first_order() returns the following two lists:

    node_array ndarray, one dimension The depth-first list of nodes, starting with specified node. The length of node_array is the number of nodes reachable from the specified node.

    predecessors ndarray, one dimension Returned only if return_predecessors is True. The length-N list of predecessors of each node in a depth-first tree. If node i is in the tree, then its parent is given by predecessors[i]. If node i is not in the tree (and for the parent node) then predecessors[i] = -9999.

    It basically implies that the second list predecessors returned has nothing to do with the first list node_array returned. Rather, the 2nd list is to be read as:

    [π[i] if i ∈ G else -9999 for all i], where π[i] is parent of node i and the graph G is a (spanning) tree here.

    The list predecessors = [-9999, 0, 1, 2, 2, 9, 2, 8,10, 4, 2, 10] is to be read as following

    π[0] = -9999 (since node 0 has no parent in G)

    π[1] = 0 (parent of node 1 is node 0)

    π[2] = 1

    π[3] = 2

    π[4] = 2

    π[[5] = 9

    π[6] = 2

    π[7] = 8

    ...

    The same result you can obtain with networkx.dfs_predecessors(), more explicitly as a dictionary, where key is a node and value is the parent of the node (obtained using dfs traversal):

    import networkx as nx
    nx.dfs_predecessors(tree, source=0)
    # {1: 0, 2: 1, 3: 2, 4: 2, 9: 4, 5: 9, 6: 2, 10: 2, 11: 10, 8: 10, 7: 8}