I have a directed network and I'm trying to construct a second-degree adjacency matrix. Suppose the network consists of people looking at each other. From the adjacency matrix I know who looks at whom. For second degree I mean this: for each person, is him looked at by at least one of the people I look at? Then I would like to attach this second-degree adjacency matrix to the initial one.
The following code is a reproducible example of what I've been trying to do, it works, but given the size of my matrices it might take several days to compute:
t <- new("dgCMatrix"
, i = c(3L, 4L, 0L, 1L, 2L, 4L, 2L, 3L, 4L, 1L, 2L, 1L)
, p = c(0L, 2L, 6L, 9L, 11L, 12L)
, Dim = c(5L, 5L)
, Dimnames = list(NULL, NULL)
, x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
, factors = list()
)
a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded
for (y in 1:5){ #example with person 1
for (i in 1:5){
for (j in 1:5){
if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
t <- cbind(t, b)
}
This is the output, and it is the desired one:
5 x 10 sparse Matrix of class "dgCMatrix"
[[ suppressing 10 column names ‘’, ‘’, ‘’ ... ]]
[1,] . 1 . . . . 1 . 1 1
[2,] . 1 . 1 1 1 1 1 1 1
[3,] . 1 1 1 . 1 1 1 1 1
[4,] 1 . 1 . . . 1 1 1 .
[5,] 1 1 1 . . . 1 1 1 1
It's not computationally intensive, just incredibly long. I had it running for 3 hours and it hadn't completed 1% of the process yet.
Does anyone know a better, faster way of doing this?
Thanks for any help
The following is likely to be much faster, but the result does not have the same dimnames
attribute.
First, the code in the question. The original matrix t
is saved to be used later.
t_save <- t # save this for later
a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded
for (y in 1:5){ #example with person 1
for (i in 1:5){
for (j in 1:5){
if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
t <- cbind(t, b)
}
result1 <- t
Now the other code giving equivalent results. The original t
is retrieved from t_saved
. And there is no need to create the vector a
.
t <- t_save
b <- integer(length = 5)
t2 <- matrix(NA, nrow = nrow(t), ncol = ncol(t))
for (y in 1:5){ #example with person 1
for (i in 1:5){
b[i] <- any(t[i, ] & t[, y])
}
t2[, y] <- as.integer(b)
}
result2 <- cbind(t, t2)
Compare both results and see that the only difference are the dim names.
all.equal(result1, result2)
#[1] "Attributes: < Component “Dimnames”: Component 2: Modes: character, NULL >"
#[2] "Attributes: < Component “Dimnames”: Component 2: Lengths: 10, 0 >"
#[3] "Attributes: < Component “Dimnames”: Component 2: target is character, current is NULL >"
So, don't check the attributes.
all.equal(result1, result2, check.attributes = FALSE)
#[1] TRUE
Edit.
Another option is to use R's matrix multiplication.
t <- t_save
t2 <- t %*% t
t2[t2 > 0] <- 1L
result3 <- cbind(t, t2)
all.equal(result2, result3)
#[1] TRUE
The 3 methods above can be written as functions with one argument only, the sparse matrix. In the question that matrix is named t
, in the functions' definitions it will be A
.
f1 <- function(A){
n <- nrow(A)
a <- numeric(length = n) #create vector for the loop
b <- numeric(length = n) #create vector to be filled and then binded
for (y in seq_len(n)){ #example with person 1
for (i in seq_len(n)){
for (j in seq_len(n)){
if (A[i,j] == 1 & A[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
A <- cbind(A, b)
}
A
}
f2 <- function(A){
n <- nrow(A)
t2 <- matrix(NA, nrow = nrow(A), ncol = ncol(A))
b <- numeric(length = n) #create vector to be filled and then binded
for (y in seq_len(n)){ #example with person 1
for (i in seq_len(n)){
b[i] <- +any(A[i, ] & A[, y])
}
t2[, y] <- b
}
cbind(A, t2)
}
f3 <- function(A){
t2 <- A %*% A
t2[t2 > 0] <- 1L
cbind(A, t2)
}
Now the tests. In order to time them, I will use package microbenchmark
.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1(t),
f2 = f2(t),
f3 = f3(t),
times = 10
)
print(mb, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f3 2.35833 2.646116 3.354992 2.702440 3.452346 6.795902 10 a
# f2 8.02674 8.062097 8.332795 8.280234 8.398213 9.087690 10 b
# f1 52.08579 52.120208 55.150915 53.949815 57.413373 61.919080 10 c
The matrix multiply function f3
is clearly the fastest.
The second test will be run with a bigger matrix.
t_save <- t
for(i in 1:5){
t <- cbind(t, t)
t <- rbind(t, t)
}
dim(t)
#[1] 160 160
And will only test f2
and f3
.
mb_big <- microbenchmark(
f2 = f2(t),
f3 = f3(t),
times = 10
)
print(mb_big, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f3 15.8503 15.94404 16.23394 16.07454 16.19684 17.88267 10 a
# f2 10682.5161 10718.67824 10825.92810 10777.95263 10912.53420 11051.10192 10 b
Now the difference is impressive.