My code is similar to this. Given a matrix like this:
a b c d
a 1 NA 3 4
b NA 2 NA 4
c NA NA NA NA
d NA NA NA 4
It converts it to this:
a a 1
a c 3
a d 4
b b 2
b d 4
d d 4
The relevant code is as below:
2 pears <- read.delim("pears.txt", header = TRUE, sep = "\t", dec = ".")
3 edges <- NULL
4 for (i in 1:nrow(pears)) {
5 for (j in 1:ncol(pears)) {
6 if (!(is.na(pears[i,j]))) {
7 edges <- rbind(edges, c(rownames(pears)[i], colnames(pears)[j], pears[i,j]))
8 }
9 }
10 print(i)
11 }
12 colnames(edges) <- c("gene1", "gene2", "PCC")
13 write.table(edges, "edges.txt", row.names = FALSE, quote = FALSE, sep = "\t")
When I run the code from a remote server in the background using screen -S
on a 17804x17804 sparse (99% NA) matrix, it initially runs 5 print statements every 13 seconds. However, it has now slowed down to 7 print statements every minute. Why is the algorithm getting slower and slower as it progresses? Is there another way I can convert my matrix into a Cytoscape's format quicker?
We convert the data.frame to matrix
, use melt
from reshape2
to get the dimnames as two columns along with the values as third column, then subset
while using na.rm
to remove the NA rows
library(reshape2)
melt(as.matrix(df1), na.rm = TRUE)
df1 <- structure(list(a = c(1L, NA, NA, NA), b = c(NA, 2L, NA, NA),
c = c(3L, NA, NA, NA), d = c(4L, 4L, NA, 4L)), class = "data.frame",
row.names = c("a",
"b", "c", "d"))