Search code examples
pythonlistnumpypython-itertoolsadjacency-matrix

Build adjacency matrix from a list of nodes by avoiding for loops


What l need to solve ?

Build a binary matrix from a list of indices. Here is how l proceed but l would like to get an efficient way to do that avoiding loops

Input :

list_indices =[
[0,3,4],
[2,1,0],
[3,5]
]

Expected output :

results=[
[0,1,1,1,1,0],
[1,0,1,0,0,0],
[0,0,0,0,0,0],
[1,0,0,0,1,1],
[1,0,0,1,0,0],
[0,0,1,0,0,0],
]

results correspond to a binary adjacency (symmetric) matrix constructed from a list of indices. 1 in results correspond to a pair of indices belonging to the same row of list_indices.

The pairs from list_indices are :

row 1 : (0,3), (3,0), (0,4),(4,0), (3,4), (4,3)
row 2 : (0,1), (1,0), (2,0), (0,2),(1,2), (2,1)
row 3 : (3,5), (5,3)


number of column and number of rows in results = np.max(list_indices)+1=6 

What l have tried ?

results=np.zeros((np.max(list_indices)+1,np.max(list_indices)+1))

for pair in itertools.combinations(list_indices, r=2) :
                      
         results[pair[0],pair[1]]=results[pair[1],pair[0]]=1.0

What is the efficient way to build that ? (avoiding loops)

itertools.combinations returns a list of pairs which are then used to fill the matrix results. Since the matrix is symmetric, itertools.combinations provides a list of pairs corresponding to the upper diagonal matrix. diagonal is set to zero


Solution

  • This question is related strongly with my investigations discussed before 10 days, so I'll post a summary of the most important discoveries here.

    • Storing communities as lists of unbalanced lengths forces to use iterations or concatenations which is not efficient. Instead you could use a single array and counts like so:

          flow = [0,3,4,2,1,0,3,5]
          counts = [3,3,2]
      
    • A faster way for single combinations is a np.triu_indices method instead of itertools.combinations:

      def combs(t):
           x, y = np.triu_indices(len(t), 1)
           return np.transpose([t[x], t[y]])
      

      It is pointed out in my solution that you are looking how to avoid both concatenation and list comprehension in:

      np.concatenate([combs(n) for n in list_indices])
      

      or, alternatively (from itertools import*):

      np.array(list(chain(*[combinations(n,2) for n in list_indices])))
      
    • I found a couple ways to vectorise it depending on your input:

      def repeat_blocks(a, sizes, repeats):
          #Thanks @Divakar: https://stackoverflow.com/a/51156138/3044825
          r1 = np.repeat(np.arange(len(sizes)), repeats)
          N = (sizes*repeats).sum() # or np.dot(sizes, repeats)
          id_ar = np.ones(N, dtype=int)
          id_ar[0] = 0
          insert_index = sizes[r1[:-1]].cumsum()
          insert_val = (1-sizes)[r1[:-1]]
          insert_val[r1[1:] != r1[:-1]] = 1
          id_ar[insert_index] = insert_val
          out = a[id_ar.cumsum()] 
          return out
      
      def concat_combs1(flow, counts):
          #way 1 based on repetition of blocks of consecutive items
          col1 = repeat_blocks(flow, counts, counts)
          col2 = np.repeat(flow, np.repeat(counts, counts))
          return np.transpose([col1, col2])[col1 < col2]
      
      def concat_combs2(targets, counts):
          #way 2 based on repetition of blocks dissociated from each other
          counts = list(map(len, targets))
          col1 = np.concatenate(np.repeat(targets, counts, axis=0))
          col2 = np.repeat(np.concatenate(targets), np.repeat(counts, counts))
          return np.transpose([col1, col2])[col1 < col2]
      

      Test:

      list_indices = [np.array([0,3,4]), np.array([2,1,0]), np.array([3,5])]
      flow = np.array([0,3,4,2,1,0,3,5])
      counts = np.array([3, 3, 2])
      # Usage:
      np.concatenate([combs(n) for n in list_indices])
      concat_combs1(flow, counts)
      concat_combs2(list_indices)
      

      Output:

      array([[0, 3],
             [0, 4],
             [3, 4],
             [1, 2],
             [0, 2],
             [0, 1],
             [3, 5]])
      

    Conclusion

    Four methods has been perfploted on igraph.Graph.Barabasi(n = x, m = 3) including itertools.combinations and np.triu_indices. Each vertex of this graph has 3 neighbours in average. In conclusion, connection of repeated consecutive blocks works the best. Concatenation of numpy arrays is slower than chaining combinations for this time because very large number of small lists is being concatenated.

    enter image description here

    Final solution

    In order to construct a matrix of incidence in a fastest way, you need to apply a small variation of concat_combs1 method:

    flow = np.array([0,3,4,2,1,0,3,5])
    counts = np.array([3,3,2])
    results = np.zeros((np.max(flow)+1, np.max(flow)+1), dtype=int)
    col1 = repeat_blocks(flow, counts, counts)
    col2 = np.repeat(flow, np.repeat(counts, counts))
    results[col1, col2] = 1
    np.fill_diagonal(results, 0)
    

    Output

    [[0 1 1 1 1 0]
     [1 0 1 0 0 0]
     [1 1 0 0 0 0]
     [1 0 0 0 1 1]
     [1 0 0 1 0 0]
     [0 0 0 1 0 0]]