Search code examples
graphscipynetworkxsparse-matrixopen3d

Quickly creating Scipy sparse matrix from adjacency list


I am trying to create a networkx graph from an adjacency list. I am currently doing the following which is slow. I looked into the scipy.sparse matrix options and as far as I could tell none were in the same format as what I am working with. It is complicated by the fact that each item in adj can have a different length.

o3dmesh.adjacency list returns a list of sets. The set adjacency_list[i] contains the indices of adjacent vertices of vertex i. There are no row, column indices, just an absolute index.

import open3d as o3d
import networkx as nx
import scipy

def adj_matrix(adj):
    n = len(adj)
    g= scipy.sparse.dok_matrix((n,n), int)
    for num, i in enumerate(adj):
        g[num, list(i)] = 1
    return g

o3dmesh = mesh = o3d.io.read_triangle_mesh(path)
adj = o3dmesh.adjacency_list
adj = adj_matrix(adj)
graph = nx.from_scipy_sparse_matrix(adj)

Solution

  • I'm not familiar with networkx or your adj list. But from the code I guess it's a list or array of integers in the (0,n) range.

    Here's the most common way of constructing a matrix from such an array:

    In [28]: from scipy import sparse                                                              
    In [29]: adj = np.random.randint(0,10,size=10)                                                 
    In [30]: adj                                                                                   
    Out[30]: array([4, 7, 9, 7, 2, 1, 3, 3, 4, 8])
    In [31]: data = np.ones_like(adj)                                                              
    In [32]: rows = np.arange(10)                                                                  
    In [33]: col = adj                                                                             
    In [34]: M = sparse.coo_matrix((data,(rows, col)), shape=(10,10))                              
    In [35]: M                                                                                     
    Out[35]: 
    <10x10 sparse matrix of type '<class 'numpy.int64'>'
        with 10 stored elements in COOrdinate format>
    In [36]: M.A                                                                                   
    Out[36]: 
    array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
           [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
           [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
           [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
    

    Though to use your function, I have turn that into a 2d array:

    In [39]: M1 = adj_matrix(adj[:,None])                                                          
    In [40]: M1                                                                                    
    Out[40]: 
    <10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 10 stored elements in Dictionary Of Keys format>
    In [41]: M1.A                                                                                  
    Out[41]: 
    array([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
           [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
           [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
           [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])
    

    If adj is a list (or object array) of lists, the coo construct will be a bit more complicated. Details will depend on what your initial adj looks like. The goal is to have one entry in each of the 3 coo input arrays for each nonzero element of the matrix.

    list of sets

    OK, let's create a list of sets:

    def foo():
        n = np.random.randint(0,10)
        return np.random.randint(0,10,(n,))
    
    In [34]: alist = [set(foo()) for _ in range(10)]                                             
    In [35]: alist                                                                               
    Out[35]: 
    [{0, 1, 3, 6, 9},
     {1, 9},
     {0, 2, 5},
     {0, 1, 2, 5, 8},
     {1, 4, 5, 8},
     {1, 4, 5, 6, 7, 8, 9},
     {0, 3, 4, 7, 8, 9},
     {0, 1, 4, 7},
     {0, 4, 7, 8},
     {1, 5, 7, 8, 9}]
    

    You should have provided such a test case or function. I did ask for a [mcve].

    Your function:

    def adj_matrix(adj):
        n = len(adj)
        g= sparse.dok_matrix((n,n), int)
        for num, i in enumerate(adj):
            g[num, list(i)] = 1
        return g
    

    An adaptation of my previous code to handle the new adj input:

    def my_adj(adj):
        n = len(adj)
        row, col, data = [],[],[]
        for num, i in enumerate(adj):
            c = list(i)
            m = len(c)
            col.append(c)
            data.append([1]*m)
            row.append([num]*m)
        data = np.hstack(data)
        row = np.hstack(row)
        col = np.hstack(col)    
        return sparse.coo_matrix((data, (row, col)), shape=(n,n))
    

    Compare the results:

    In [36]: adj_matrix(alist)                                                                   
    Out[36]: 
    <10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 45 stored elements in Dictionary Of Keys format>
    In [37]: _.A                                                                                 
    Out[37]: 
    array([[1., 1., 0., 1., 0., 0., 1., 0., 0., 1.],
           [0., 1., 0., 0., 0., 0., 0., 0., 0., 1.],
           [1., 0., 1., 0., 0., 1., 0., 0., 0., 0.],
           [1., 1., 1., 0., 0., 1., 0., 0., 1., 0.],
           [0., 1., 0., 0., 1., 1., 0., 0., 1., 0.],
           [0., 1., 0., 0., 1., 1., 1., 1., 1., 1.],
           [1., 0., 0., 1., 1., 0., 0., 1., 1., 1.],
           [1., 1., 0., 0., 1., 0., 0., 1., 0., 0.],
           [1., 0., 0., 0., 1., 0., 0., 1., 1., 0.],
           [0., 1., 0., 0., 0., 1., 0., 1., 1., 1.]])
    In [38]: my_adj(alist)                                                                       
    Out[38]: 
    <10x10 sparse matrix of type '<class 'numpy.int64'>'
        with 45 stored elements in COOrdinate format>
    In [39]: _.A                                                                                 
    Out[39]: 
    array([[1, 1, 0, 1, 0, 0, 1, 0, 0, 1],
           [0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
           [1, 0, 1, 0, 0, 1, 0, 0, 0, 0],
           [1, 1, 1, 0, 0, 1, 0, 0, 1, 0],
           [0, 1, 0, 0, 1, 1, 0, 0, 1, 0],
           [0, 1, 0, 0, 1, 1, 1, 1, 1, 1],
           [1, 0, 0, 1, 1, 0, 0, 1, 1, 1],
           [1, 1, 0, 0, 1, 0, 0, 1, 0, 0],
           [1, 0, 0, 0, 1, 0, 0, 1, 1, 0],
           [0, 1, 0, 0, 0, 1, 0, 1, 1, 1]])
    

    timings:

    In [44]: timeit adj_matrix(alist).tocoo()                                                    
    2.09 ms ± 76.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [45]: timeit my_adj(alist)                                                                
    259 µs ± 203 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)