I am trying to find triangles in a distance matrix where a direct path is longer than going via another point. The goal is to reduce the number of edges in a fully connected graph. The function works well for low values of n but is slow for larger values. I am trying to find out how to speed up this process.
I had hoped that by keeping the data as a matrix and manipulating it like that the process would be vectorised and very fast, however, this is not what has happened.
I have tried to use lineprof
and click through to lower functions, but I don't understand what it is telling me. I don't know if there is some function in igraph
that would help?
library(purrr);library(magrittr); library(lineprof);library(shiny)
#The function used to find triangles
RemoveTri <- function(s){
Smat<- col(s)
RemoveEdge <- 1:ncol(s) %>%
map(~{
print(.x)
LogicMat <- s + s[,.x][Smat] < (s[,.x]) #I used this method to avoid transposing
matrix(data = rowSums(LogicMat, na.rm = TRUE ) > 0, ncol = 1) #TRUE means edge can be removed
}) %>%
do.call(cbind,.)
s[RemoveEdge] <- NA
return(s)
}
#This function just creates a dataframe
CreateData <- function(n, seed){
set.seed(seed)
s <- matrix(rnorm(n^2), n) #%>% cor
s <- s +abs(min(s))+0.001
s[lower.tri(s)] = t(s)[lower.tri(s)]
diag(s) <- 0
return(s)
}
#Using a small amount of data
s <- CreateData(100, 876)
RemoveTri(s)
#using a larger amount of data
s2 <- CreateData(4000, 876)
RemoveTri(s2)
l <- lineprof(RemoveTri(s))
shine(l)
As the matrix is symmetrical the process can be sped up by only calculating the lower triangular matrix. By doing this we can reduce the number of calculations from $n^3$ to
$\frac{n}{6}(2n^2+3n+1)$ which gives a ratio of $\frac{(2n+1)(n+1)}{6n^2}$ which results in approx 2/3 reduction in the total number of calculations when n is large.
The adjusted function is below.
This function starts slowly and speeds up as more rows are calculated. At small values of n it is slower than the original function due to the additional overhead, but becomes faster when n is greater than a couple of hundred.
RemoveTri <- function(s){
Smat <- col(s)
RemoveEdge <- 1:ncol(s) %>%
map(~{
print(.x)
TargetRows <- .x:ncol(s)
LogicMat <- s[TargetRows,TargetRows, drop = F] + s[TargetRows,.x][Smat[1:length(TargetRows),1:length(TargetRows)]] < s[TargetRows,.x]
matrix(data = c(rep(NA, .x-1),rowSums(LogicMat, na.rm = TRUE ) > 0), ncol = 1) #TRUE means edge should be removed
}) %>%
do.call(cbind,.)
RemoveEdge[upper.tri(RemoveEdge)] <- t(RemoveEdge)[upper.tri(RemoveEdge)]
s[RemoveEdge] <- NA
s
}