I'd like to create a heat map on one variable against itself. However, I don't have it in matrix format. I have the PCA1 and PCA2 coordinates of each item and I'd like to know how I can create a heat map out of this. This is what my data looks like (where cluster is a k-means cluster classification)
ID PCA1 PCA2 cluster
echocardiography -0.88 0.87 9
infarction -0.18 0.57 7
carotid 1.13 -0.80 2
aorta -0.03 -0.06 5
myocardial -0.72 -0.02 3
hemorrhage 0.23 -0.67 5
so basically I want a heat map between the IDs that shows (by possibly using PCA coordinate distance) how correlated each ID is.
note: the heat map should look something like this (vs a density heat plot):
Here is a possibile solution. Hope it can help you.
df <- structure(list(ID = structure(c(3L, 5L, 2L, 1L, 6L, 4L), .Label = c("aorta",
"carotid", "echocardiography", "hemorrhage", "infarction", "myocardial"
), class = "factor"), PCA1 = c(-0.88, -0.18, 1.13, -0.03, -0.72,
0.23), PCA2 = c(0.87, 0.57, -0.8, -0.06, -0.02, -0.67), cluster = c(9L,
7L, 2L, 5L, 3L, 5L)), .Names = c("ID", "PCA1", "PCA2", "cluster"
), class = "data.frame", row.names = c(NA, -6L))
# Define a distance function based on euclidean norm
# calculated between PCA values of the i-th and j-th items
dst <- Vectorize(function(i,j,dtset) sqrt(sum((dtset[i,2:3]-dtset[j,2:3])^2)), vectorize.args=c("i","j"))
# Here is the distance between echocardiography and infarction
dst(1,2,df)
# [1] 0.7615773
# This value is given by
sqrt(sum((df[1,2:3] - df[2,2:3])^2))
# Calculate the distance matrix
nr <- nrow(df)
mtx <- outer(1:nr, 1:nr, "dst", dtset=df)
colnames(mtx) <- rownames(mtx) <- df[,1]
# Plot the heatmap using ggplot2
library(reshape2)
library(ggplot2)
mtx.long <- melt(mtx)
ggplot(mtx.long, aes(x = Var1, y = Var2, fill = value)) + geom_tile()+xlab("")+ylab("")