My problem (today) is as follows:
I have an upper diagonal distance matrix in a file ("dist.dis") in text format (generated by a third-part program), which I want to read into R to compute a cluster analysis and generate a dendrogram:
0.36364 0.36364 0.27273 0.81818 0.54545 0.63636 0.36364 0.45455
0.18182 0.63636 0.63636 0.36364 0.63636 0.54545 0.09091
0.45455 0.63636 0.18182 0.63636 0.54545 0.27273
0.81818 0.63636 0.81818 0.27273 0.72727
0.45455 0.18182 0.63636 0.54545
0.45455 0.54545 0.27273
0.81818 0.54545
0.45455
In a separate text file ("dist.nam"), I also have a list of names of the objects among which the distances have been computed:
COOKO-A
COOKO-B
COOKO-C
COOKO-D
COOKO-E
COOKO-F
COOKO-G
COOKO-H
COOKO-I
Here is my R code to read the above matrix and generate a dendrogram:
mat <- matrix(0, 9, 9)
mat[row(mat) >= col(mat)] <- scan("dist.dis")
hc <- hclust(as.dist(mat), method="average")
ppi <- 100
png("clus.png", width=6*ppi, height=6*ppi, res=ppi)
plot(as.dendrogram(hc), xlab="Distance", ylab="", main="UPGMA dendrogram", horiz=TRUE, edgePar=list(col="blue", lwd=3))
dev.off()
This code works, and generates the dendrogram below:
However, I want to have the names of the objects (instead of their numbers) at the tips of the dendrogram. To achieve this, I tried the code below:
names <- scan("dist.nam", what="character")
df.dist <- as.dist(mat)
df.dist <- as.matrix(df.dist, labels=TRUE)
colnames(df.dist) <- names
rownames(df.dist) <- names
hc <- hclust(as.dist(mat), method="average")
But then I got a dreadful error: "Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") : missing value where TRUE/FALSE needed".
Could someone give me a hand?
My suspicion is that this is related to using hclust
with a matrix and not a dist
object.
I would change the names of the matrix mat
and then use as.dist
(note you really only need to set colnames
not both rows and columns). Let me know if this works for you.
mat <- matrix(0, 9, 9)
mat[row(mat) >= col(mat)] <- scan("dist.dis")
names <- scan("dist.nam", what="character")
colnames(mat) <- names
df.dist <- as.dist(mat)
hc <- hclust(df.dist, method="average")
ppi <- 100
png("clus.png", width=6*ppi, height=6*ppi, res=ppi)
par(mar=c(4,4,4,4))
plot(as.dendrogram(hc), xlab="Distance", ylab="", main="UPGMA dendrogram", horiz=TRUE, edgePar=list(col="blue", lwd=3))
dev.off()