Search code examples
pythonnumpyigraphnumpy-ndarrayadjacency-matrix

Sum the edge in adjacency matrix according to group label in numpy array


For example, I got the symmetric adjacency matrix(no self-loop), i.e

A = 
[
[0, 1, 1, 0, 0],
[1, 0, 1, 0, 1],
[1, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[0, 1, 0, 1, 0]]

Then, I got an array of cluster label i, associated with this graph, i.e

cluster = [1,1,2,2,3]

It means that the node 1 and node 2 are in the same group, node 3 and node 4 are in the same group, node 5 are in a independent group.

The question is that how can I get the sum of the edges within groups and between groups,

Within groups: means the edge between nodes that share the same cluster label, for example, node 1 and node 2 are the in the same group, so the sum is 1 for them, the same for node 3 and node 4. For node 5, its 0.

Between groups: means the edge between nodes that share different cluster label, for example, group 1 and group 2, it means that sum the edge of, node 1 node 3, node 1 node 4, node 2 node 3, node 2 node 4. The answer is 2 between group 1 and group 2.

then return a 2d symmetric array contains the result, i.e

[[1,2,1],
[2,1,1],
[1,1,0]]

Solution

  • You can use matrix algebra;

    cluster = np.array(cluster)
    # create cluster-node adjacency matrix
    aux = np.identity(cluster.max(),int)[cluster-1]
    # we can now count by multiplying
    out = aux.T@A@aux
    # fix diagonal (which was counted twice)
    np.einsum("ii->i",out)[...] //= 2
    out
    # array([[1, 2, 1],
    #        [2, 1, 1],
    #        [1, 1, 0]])
    

    To speed this up we can replace the matrix product with (1) if nodes are sorted by cluster:

    boundaries = np.diff(cluster,prepend=-1).nonzero()[0]
    out =  np.add.reduceat(np.add.reduceat(A,boundaries,1),boundaries,0)
    

    (2) if not:

    nc = cluster.max()
    out = np.zeros((nc,nc),int)
    np.add.at(out,(cluster[:,None]-1,cluster-1),A)