I have a 74x74 pairwise distance matrix of SNP differences in which the first column and row correspond to the isolate's number, like this:
26482RR 25638 26230 25689RR 25954
26482RR 0 8 0 6 0
25638 8 0 8 14 8
26230 0 8 0 6 0
25689RR 6 14 6 0 6
25954 0 8 0 6 0
M = structure(c(0L, 8L, 0L, 6L, 0L, 8L, 0L, 8L, 14L, 8L, 0L, 8L,
0L, 6L, 0L, 6L, 14L, 6L, 0L, 6L, 0L, 8L, 0L, 6L, 0L), .Dim = c(5L,
5L), .Dimnames = list(c("26482RR", "25638", "26230", "25689RR",
"25954"), c("26482RR", "25638", "26230", "25689RR", "25954")))
I would like to convert this matrix into a table of SNP differences for each pair of isolates, like so:
Col Row SNP differences
26482RR 25638 8
26482RR 26230 0
26482RR 25689RR 6
26482RR 25954 0
25638 26230 8
25638 25689RR 14
25638 25954 8
...
in order to plot this data and correlate it with other matrices. I am a beginner in R so after a bit of searching I have decided to apply the following code:
st1076 <- read.csv("st1076.csv", header=TRUE, sep=";")
m1 <- as.matrix(st1076)
m1 <- m1[upper.tri(m1)] <- NA
m1_melted <- reshape2:::melt.matrix(m1, na.rm = TRUE)
colnames(m1_melted) <- c("Col","Row","SNP differences")
However, with this code I get in "Col" the numeration of each isolate by its order of occurrence ( 1, 2, 3, 4...) and not is respective isolate number:
Col Row SNP differences
2 X26482RR 8
3 X26482RR 0
4 X26482RR 6
From what I saw in other related questions, using melt.matrix
should solve this problem but it didn't work for me.
Can anyone help me understand why this happened? Do you have any suggestions in how to overcome it?
I think your code was correct except reading from csv. Because csvs are interpreted as data frames by read.csv
, some processing is required to get a matrix:
DF = read.csv("st1076.csv", sep=";", row.names=1, check.names=FALSE)
M = as.matrix(DF)
res <- reshape2::melt(replace(M, upper.tri(M), NA),
varnames = c("Col", "Row"),
value.name = "SNP differences",
na.rm = TRUE
)
head(res)
Col Row SNP differences
1 26482RR 26482RR 0
2 25638 26482RR 8
3 26230 26482RR 0
4 25689RR 26482RR 6
5 25954 26482RR 0
6 25692 26482RR 2
For reference, I started with this thread https://stat.ethz.ch/pipermail/r-help/2010-May/237835.html and then consulted the help file ?read.csv