Search code examples
rmatrixigraphadjacency-matrix

Find triangles with shorter edges in a distance matrix


I am trying to find triangles in a distance matrix where a direct path is longer than going via another point. The goal is to reduce the number of edges in a fully connected graph. The function works well for low values of n but is slow for larger values. I am trying to find out how to speed up this process.

I had hoped that by keeping the data as a matrix and manipulating it like that the process would be vectorised and very fast, however, this is not what has happened.

I have tried to use lineprof and click through to lower functions, but I don't understand what it is telling me. I don't know if there is some function in igraph that would help?

library(purrr);library(magrittr); library(lineprof);library(shiny)

#The function used to find triangles
RemoveTri <- function(s){
  Smat<- col(s) 
  RemoveEdge <- 1:ncol(s) %>%
  map(~{
  print(.x)
    LogicMat <- s + s[,.x][Smat] < (s[,.x]) #I used this method to avoid transposing
    matrix(data = rowSums(LogicMat, na.rm = TRUE ) > 0, ncol = 1) #TRUE means edge can be removed
  }) %>%
  do.call(cbind,.)
  s[RemoveEdge] <- NA
return(s)
}

#This function just creates a dataframe
CreateData <- function(n, seed){
  set.seed(seed)
  s <- matrix(rnorm(n^2), n) #%>% cor
  s <- s +abs(min(s))+0.001
  s[lower.tri(s)] = t(s)[lower.tri(s)]
  diag(s) <- 0
  return(s)
 }

#Using a small amount of data
s <- CreateData(100, 876)
RemoveTri(s)

#using a larger amount of data
s2 <- CreateData(4000, 876)
RemoveTri(s2)

l <- lineprof(RemoveTri(s))
shine(l)

Solution

  • As the matrix is symmetrical the process can be sped up by only calculating the lower triangular matrix. By doing this we can reduce the number of calculations from $n^3$ to
    $\frac{n}{6}(2n^2+3n+1)$ which gives a ratio of $\frac{(2n+1)(n+1)}{6n^2}$ which results in approx 2/3 reduction in the total number of calculations when n is large.

    The adjusted function is below.

    This function starts slowly and speeds up as more rows are calculated. At small values of n it is slower than the original function due to the additional overhead, but becomes faster when n is greater than a couple of hundred.

    RemoveTri  <- function(s){
          Smat <- col(s) 
    
          RemoveEdge <- 1:ncol(s) %>%
          map(~{
            print(.x)
            TargetRows <- .x:ncol(s)
            LogicMat <- s[TargetRows,TargetRows, drop = F] + s[TargetRows,.x][Smat[1:length(TargetRows),1:length(TargetRows)]]  < s[TargetRows,.x]
    
    
            matrix(data = c(rep(NA, .x-1),rowSums(LogicMat, na.rm = TRUE ) > 0), ncol = 1) #TRUE means edge should be removed
    
          }) %>%
          do.call(cbind,.)
    
          RemoveEdge[upper.tri(RemoveEdge)]  <- t(RemoveEdge)[upper.tri(RemoveEdge)]
    
          s[RemoveEdge] <- NA 
    
        s
    
    }