Search code examples
rfor-loopnetwork-programmingadjacency-matrix

Computing second-degree adjacency matrix in directed network


I have a directed network and I'm trying to construct a second-degree adjacency matrix. Suppose the network consists of people looking at each other. From the adjacency matrix I know who looks at whom. For second degree I mean this: for each person, is him looked at by at least one of the people I look at? Then I would like to attach this second-degree adjacency matrix to the initial one.

The following code is a reproducible example of what I've been trying to do, it works, but given the size of my matrices it might take several days to compute:

t <- new("dgCMatrix"
, i = c(3L, 4L, 0L, 1L, 2L, 4L, 2L, 3L, 4L, 1L, 2L, 1L)
, p = c(0L, 2L, 6L, 9L, 11L, 12L)
, Dim = c(5L, 5L)
, Dimnames = list(NULL, NULL)
, x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
, factors = list()
)

a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded

for (y in 1:5){   #example with person 1

  for (i in 1:5){   

    for (j in 1:5){

        if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1} 
         else {a[j] <- 0}
    }    #if the ones that i looks at, do look at person 1

    if (sum(a) >= 1){b[i] <-  1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1

}  

t <- cbind(t, b)

}

This is the output, and it is the desired one:

5 x 10 sparse Matrix of class "dgCMatrix"
[[ suppressing 10 column names ‘’, ‘’, ‘’ ... ]]

[1,] . 1 . . . . 1 . 1 1
[2,] . 1 . 1 1 1 1 1 1 1
[3,] . 1 1 1 . 1 1 1 1 1
[4,] 1 . 1 . . . 1 1 1 .
[5,] 1 1 1 . . . 1 1 1 1

It's not computationally intensive, just incredibly long. I had it running for 3 hours and it hadn't completed 1% of the process yet.

Does anyone know a better, faster way of doing this?

Thanks for any help


Solution

  • The following is likely to be much faster, but the result does not have the same dimnames attribute.

    First, the code in the question. The original matrix t is saved to be used later.

    t_save <- t    # save this for later
    
    a <- numeric(length = 5) #create vector for the loop
    b <- numeric(length = 5) #create vector to be filled and then binded
    
    for (y in 1:5){   #example with person 1
      for (i in 1:5){   
        for (j in 1:5){
          if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1} 
          else {a[j] <- 0}
        }    #if the ones that i looks at, do look at person 1
        if (sum(a) >= 1){b[i] <-  1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
      }  
      t <- cbind(t, b)
    }
    result1 <- t
    

    Now the other code giving equivalent results. The original t is retrieved from t_saved. And there is no need to create the vector a.

    t <- t_save
    
    b <- integer(length = 5)
    t2 <- matrix(NA, nrow = nrow(t), ncol = ncol(t))
    for (y in 1:5){   #example with person 1
      for (i in 1:5){
        b[i] <- any(t[i, ] & t[, y])
      }  
      t2[, y] <- as.integer(b)
    }
    result2 <- cbind(t, t2)
    

    Compare both results and see that the only difference are the dim names.

    all.equal(result1, result2)
    #[1] "Attributes: < Component “Dimnames”: Component 2: Modes: character, NULL >"              
    #[2] "Attributes: < Component “Dimnames”: Component 2: Lengths: 10, 0 >"                      
    #[3] "Attributes: < Component “Dimnames”: Component 2: target is character, current is NULL >"
    

    So, don't check the attributes.

    all.equal(result1, result2, check.attributes = FALSE)
    #[1] TRUE
    

    Edit.

    Another option is to use R's matrix multiplication.

    t <- t_save
    
    t2 <- t %*% t
    t2[t2 > 0] <- 1L
    result3 <- cbind(t, t2)
    all.equal(result2, result3)
    #[1] TRUE
    

    Benchmarks.

    The 3 methods above can be written as functions with one argument only, the sparse matrix. In the question that matrix is named t, in the functions' definitions it will be A.

    f1 <- function(A){
      n <- nrow(A)
      a <- numeric(length = n) #create vector for the loop
      b <- numeric(length = n) #create vector to be filled and then binded
    
      for (y in seq_len(n)){   #example with person 1
        for (i in seq_len(n)){   
          for (j in seq_len(n)){
            if (A[i,j] == 1 & A[j,y] == 1){a[j] <- 1} 
            else {a[j] <- 0}
          }    #if the ones that i looks at, do look at person 1
          if (sum(a) >= 1){b[i] <-  1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
        }  
        A <- cbind(A, b)
      }
      A
    }
    
    f2 <- function(A){
      n <- nrow(A)
      t2 <- matrix(NA, nrow = nrow(A), ncol = ncol(A))
      b <- numeric(length = n) #create vector to be filled and then binded
      for (y in seq_len(n)){   #example with person 1
        for (i in seq_len(n)){
          b[i] <- +any(A[i, ] & A[, y])
        }  
        t2[, y] <- b
      }
      cbind(A, t2)
    }
    
    f3 <- function(A){
      t2 <- A %*% A
      t2[t2 > 0] <- 1L
      cbind(A, t2)
    }
    

    Now the tests. In order to time them, I will use package microbenchmark.

    library(microbenchmark)
    
    mb <- microbenchmark(
      f1 = f1(t),
      f2 = f2(t),
      f3 = f3(t),
      times = 10
    )
    
    print(mb, order = "median")
    #Unit: milliseconds
    # expr      min        lq      mean    median        uq       max neval cld
    #   f3  2.35833  2.646116  3.354992  2.702440  3.452346  6.795902    10 a  
    #   f2  8.02674  8.062097  8.332795  8.280234  8.398213  9.087690    10  b 
    #   f1 52.08579 52.120208 55.150915 53.949815 57.413373 61.919080    10   c
    

    The matrix multiply function f3 is clearly the fastest.
    The second test will be run with a bigger matrix.

    t_save <- t
    
    for(i in 1:5){
      t <- cbind(t, t)
      t <- rbind(t, t)
    }
    dim(t)
    #[1] 160 160
    

    And will only test f2 and f3.

    mb_big <- microbenchmark(
      f2 = f2(t),
      f3 = f3(t),
      times = 10
    )
    
    print(mb_big, order = "median")
    #Unit: milliseconds
    # expr        min          lq        mean      median          uq         max neval cld
    #   f3    15.8503    15.94404    16.23394    16.07454    16.19684    17.88267    10  a 
    #   f2 10682.5161 10718.67824 10825.92810 10777.95263 10912.53420 11051.10192    10   b
    

    Now the difference is impressive.