python numpy igraph numpy-ndarray adjacency-matrix

Sum the edge in adjacency matrix according to group label in numpy array

For example, I got the symmetric adjacency matrix(no self-loop), i.e

A = 
[
[0, 1, 1, 0, 0],
[1, 0, 1, 0, 1],
[1, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[0, 1, 0, 1, 0]]

Then, I got an array of cluster label i, associated with this graph, i.e

cluster = [1,1,2,2,3]

It means that the node 1 and node 2 are in the same group, node 3 and node 4 are in the same group, node 5 are in a independent group.

The question is that how can I get the sum of the edges within groups and between groups,

Within groups: means the edge between nodes that share the same cluster label, for example, node 1 and node 2 are the in the same group, so the sum is 1 for them, the same for node 3 and node 4. For node 5, its 0.

Between groups: means the edge between nodes that share different cluster label, for example, group 1 and group 2, it means that sum the edge of, node 1 node 3, node 1 node 4, node 2 node 3, node 2 node 4. The answer is 2 between group 1 and group 2.

then return a 2d symmetric array contains the result, i.e

[[1,2,1],
[2,1,1],
[1,1,0]]

Solution

You can use matrix algebra;

cluster = np.array(cluster)
# create cluster-node adjacency matrix
aux = np.identity(cluster.max(),int)[cluster-1]
# we can now count by multiplying
out = aux.T@A@aux
# fix diagonal (which was counted twice)
np.einsum("ii->i",out)[...] //= 2
out
# array([[1, 2, 1],
#        [2, 1, 1],
#        [1, 1, 0]])

To speed this up we can replace the matrix product with (1) if nodes are sorted by cluster:

boundaries = np.diff(cluster,prepend=-1).nonzero()[0]
out =  np.add.reduceat(np.add.reduceat(A,boundaries,1),boundaries,0)

(2) if not:

nc = cluster.max()
out = np.zeros((nc,nc),int)
np.add.at(out,(cluster[:,None]-1,cluster-1),A)