I'd like to change from "for loop" to "foreach" to do it fast. From spatial data, we can get X and Y coordination. In the case of New York, there are about 15,000 block groups. So, distance matrix(mat) can be 15,000*15,000 cells. I don't exactly know the reason, but it takes not much time to get distance matric(A). The issue is for loop. I need weight matrix (W) that indicates the nearest 100 block groups (neighbors). Following codes are working, but it's too slow. I'd like to change them more efficiently with using "foreach" and "parallel" library. Could you let me know how to change following for loop? Thank you so much.
```
coor<-cbind(UA$X, UA$Y) # X, Y coordination
A<-dist(coor, diag=T, upper=T) #distance b/w coor
mat <- as.matrix(A)
q<-100 # it can be changed
W<-array(0L, dim(A))
for (i in 1:nrow(mat)){
W[order(mat[,i])[1:q],i]<-mat[order(mat[,i])[1:q],i]
D<-apply(W, 2, max, na.rm=TRUE)[i]
W[order(mat[,i])[1:q],i]<-(1-(W[order(mat[,i])[1:q],i]/D)^3)^3 #tri-cube function
}
```
Something like...
```
coor<-cbind(UA$X, UA$Y) # X, Y coordination
A<-dist(coor, diag=T, upper=T) #distance b/w coor
mat <- as.matrix(A)
q<-100 # it can be changed
W<-array(0L, dim(A))
foreach::foreach(i = 1:nrow(mat)) %dopar% {
W[order(mat[,i])[1:q],i]<-mat[order(mat[,i])[1:q],i]
D<-apply(W, 2, max, na.rm=TRUE)[i]
W[order(mat[,i])[1:q],i]<-(1-(W[order(mat[,i])[1:q],i]/D)^3)^3 #tri-cube function
}
```
This isn't well suited for parallelization. It would require too much overhead passing data back and forth. This is the kind of problem that nearest-neighbor algorithms and sparse matrices were made for.
set.seed(588345973)
x <- runif(15e3)
y <- runif(15e3)
q <- 100L
library(RANN)
library(Matrix)
system.time(
W <- with(
nn2(cbind(x, y), k = q),
sparseMatrix(
i = nn.idx,
j = rep.int(1:length(x), q),
x = c((1 - (nn.dists/nn.dists[,q])^3)^3)
)
)
)
#> user system elapsed
#> 0.42 0.02 0.44
Compare to an optimized version of the original approach:
library(Rfast) # for `Dist`
system.time({
mat <- Dist(cbind(x, y))
W2 <- array(0, dim(mat))
for (i in 1:nrow(mat)) {
o <- order(mat[,i])[1:q]
W2[o,i] <- (1 - (mat[o,i]/mat[o[q],i])^3)^3
}
})
#> user system elapsed
#> 13.07 1.11 14.19
Check that the results are equivalent:
all(sapply(1:length(x), \(i) all.equal(W[,i], W2[,i])))
#> [1] TRUE