Search code examples
rsparse-matrixcytoscape

Conversion from pairwise matrix to Cytoscape edge table is too slow


My code is similar to this. Given a matrix like this:

  a  b  c  d
a 1  NA 3  4
b NA 2  NA 4
c NA NA NA NA
d NA NA NA 4

It converts it to this:

a  a  1
a  c  3
a  d  4
b  b  2
b  d  4
d  d  4

The relevant code is as below:

  2 pears <- read.delim("pears.txt", header = TRUE, sep = "\t", dec = ".")
  3 edges <- NULL
  4 for (i in 1:nrow(pears)) {
  5         for (j in 1:ncol(pears)) {
  6                 if (!(is.na(pears[i,j]))) {
  7                         edges <- rbind(edges, c(rownames(pears)[i], colnames(pears)[j], pears[i,j]))
  8                 }
  9         }
 10         print(i)
 11 }
 12 colnames(edges) <- c("gene1", "gene2", "PCC")
 13 write.table(edges, "edges.txt", row.names = FALSE, quote = FALSE, sep = "\t")

When I run the code from a remote server in the background using screen -S on a 17804x17804 sparse (99% NA) matrix, it initially runs 5 print statements every 13 seconds. However, it has now slowed down to 7 print statements every minute. Why is the algorithm getting slower and slower as it progresses? Is there another way I can convert my matrix into a Cytoscape's format quicker?


Solution

  • We convert the data.frame to matrix, use melt from reshape2 to get the dimnames as two columns along with the values as third column, then subset while using na.rm to remove the NA rows

    library(reshape2)
    melt(as.matrix(df1), na.rm = TRUE)
    

    data

    df1 <- structure(list(a = c(1L, NA, NA, NA), b = c(NA, 2L, NA, NA), 
    c = c(3L, NA, NA, NA), d = c(4L, 4L, NA, 4L)), class = "data.frame", 
      row.names = c("a", 
      "b", "c", "d"))