Search code examples
rsparse-matrixadjacency-matrix

How to get an Adjacency matrix from count matrix


I have a nxp very sparse count matrix with only non-negative values and columns named y_1, ... , y_p. (n=2 million and p=70)

I want to convert it, using R, into a matrix that counts the amount of times that y_i and y_j have a non-zero value on the same row.

Example:

ID a b c d e 
1  1 0 1 0 0
2  0 1 1 0 0
3  0 0 1 1 0
4  1 1 0 0 0

and i want to obtain:

- a b c d e
a 2 1 1 0 0
b 1 2 1 0 0 
c 1 1 3 1 0
d 0 0 1 1 0
e 0 0 0 0 0

Solution

  • This is a simple matrix multiplication.

    t(m) %*% m
      a b c d e
    a 2 1 1 0 0
    b 1 2 1 0 0
    c 1 1 3 1 0
    d 0 0 1 1 0
    e 0 0 0 0 0
    

    Using this data:

    m = read.table(text = "ID a b c d e 
    1  1 0 1 0 0
    2  0 1 1 0 0
    3  0 0 1 1 0
    4  1 1 0 0 0", header = T)
    m = as.matrix(m[, -1])
    

    This relies on the original matrix being only 1s and 0s. If it is not, you can create it with m = original_matrix > 0


    Here's it working on a matrix like you describe:

    library(Matrix)
    nr = 2e6
    nc = 70
    mm = Matrix(0, nrow = nr, ncol = nc, sparse = T)
    
    # make, on average, three 1s per row
    set.seed(47)
    mm[cbind(sample(nr, size = 3 * nr, replace = T), sample(nc, size = 3 * nr, replace = T))] = 1 
    
    system.time({res = t(mm) %*% mm})
      #  user  system elapsed 
      # 0.836   0.057   0.895 
    format(object.size(res), units = "Mb")
    [1] "0.1 Mb
    

    On my laptop the calculation takes less than a second and the result is about 0.1 Mb.