What l need to solve ?
Build a binary matrix from a list of indices. Here is how l proceed but l would like to get an efficient way to do that avoiding loops
Input :
list_indices =[
[0,3,4],
[2,1,0],
[3,5]
]
Expected output :
results=[
[0,1,1,1,1,0],
[1,0,1,0,0,0],
[0,0,0,0,0,0],
[1,0,0,0,1,1],
[1,0,0,1,0,0],
[0,0,1,0,0,0],
]
results correspond to a binary adjacency (symmetric) matrix constructed from a list of indices. 1 in results correspond to a pair of indices belonging to the same row of list_indices.
The pairs from list_indices are :
row 1 : (0,3), (3,0), (0,4),(4,0), (3,4), (4,3)
row 2 : (0,1), (1,0), (2,0), (0,2),(1,2), (2,1)
row 3 : (3,5), (5,3)
number of column and number of rows in results = np.max(list_indices)+1=6
What l have tried ?
results=np.zeros((np.max(list_indices)+1,np.max(list_indices)+1))
for pair in itertools.combinations(list_indices, r=2) :
results[pair[0],pair[1]]=results[pair[1],pair[0]]=1.0
What is the efficient way to build that ? (avoiding loops)
itertools.combinations
returns a list of pairs which are then used to fill the matrix results. Since the matrix is symmetric, itertools.combinations provides a list of pairs corresponding to the upper diagonal matrix. diagonal is set to zero
This question is related strongly with my investigations discussed before 10 days, so I'll post a summary of the most important discoveries here.
Storing communities as lists of unbalanced lengths forces to use iterations or concatenations which is not efficient. Instead you could use a single array and counts like so:
flow = [0,3,4,2,1,0,3,5]
counts = [3,3,2]
A faster way for single combinations is a np.triu_indices
method instead of itertools.combinations
:
def combs(t):
x, y = np.triu_indices(len(t), 1)
return np.transpose([t[x], t[y]])
It is pointed out in my solution that you are looking how to avoid both concatenation and list comprehension in:
np.concatenate([combs(n) for n in list_indices])
or, alternatively (from itertools import*
):
np.array(list(chain(*[combinations(n,2) for n in list_indices])))
I found a couple ways to vectorise it depending on your input:
def repeat_blocks(a, sizes, repeats):
#Thanks @Divakar: https://stackoverflow.com/a/51156138/3044825
r1 = np.repeat(np.arange(len(sizes)), repeats)
N = (sizes*repeats).sum() # or np.dot(sizes, repeats)
id_ar = np.ones(N, dtype=int)
id_ar[0] = 0
insert_index = sizes[r1[:-1]].cumsum()
insert_val = (1-sizes)[r1[:-1]]
insert_val[r1[1:] != r1[:-1]] = 1
id_ar[insert_index] = insert_val
out = a[id_ar.cumsum()]
return out
def concat_combs1(flow, counts):
#way 1 based on repetition of blocks of consecutive items
col1 = repeat_blocks(flow, counts, counts)
col2 = np.repeat(flow, np.repeat(counts, counts))
return np.transpose([col1, col2])[col1 < col2]
def concat_combs2(targets, counts):
#way 2 based on repetition of blocks dissociated from each other
counts = list(map(len, targets))
col1 = np.concatenate(np.repeat(targets, counts, axis=0))
col2 = np.repeat(np.concatenate(targets), np.repeat(counts, counts))
return np.transpose([col1, col2])[col1 < col2]
Test:
list_indices = [np.array([0,3,4]), np.array([2,1,0]), np.array([3,5])]
flow = np.array([0,3,4,2,1,0,3,5])
counts = np.array([3, 3, 2])
# Usage:
np.concatenate([combs(n) for n in list_indices])
concat_combs1(flow, counts)
concat_combs2(list_indices)
Output:
array([[0, 3],
[0, 4],
[3, 4],
[1, 2],
[0, 2],
[0, 1],
[3, 5]])
Four methods has been perfplot
ed on igraph.Graph.Barabasi(n = x, m = 3)
including itertools.combinations
and np.triu_indices
. Each vertex of this graph has 3 neighbours in average. In conclusion, connection of repeated consecutive blocks works the best. Concatenation of numpy arrays is slower than chaining combinations for this time because very large number of small lists is being concatenated.
In order to construct a matrix of incidence in a fastest way, you need to apply a small variation of concat_combs1
method:
flow = np.array([0,3,4,2,1,0,3,5])
counts = np.array([3,3,2])
results = np.zeros((np.max(flow)+1, np.max(flow)+1), dtype=int)
col1 = repeat_blocks(flow, counts, counts)
col2 = np.repeat(flow, np.repeat(counts, counts))
results[col1, col2] = 1
np.fill_diagonal(results, 0)
[[0 1 1 1 1 0]
[1 0 1 0 0 0]
[1 1 0 0 0 0]
[1 0 0 0 1 1]
[1 0 0 1 0 0]
[0 0 0 1 0 0]]