Search code examples
rmatrixheatmaptext-miningpca

create heat map from PCA coordinates in R


I'd like to create a heat map on one variable against itself. However, I don't have it in matrix format. I have the PCA1 and PCA2 coordinates of each item and I'd like to know how I can create a heat map out of this. This is what my data looks like (where cluster is a k-means cluster classification)

ID                     PCA1             PCA2          cluster
echocardiography       -0.88            0.87          9
infarction             -0.18            0.57          7
carotid                1.13             -0.80         2
aorta                  -0.03            -0.06         5
myocardial             -0.72            -0.02         3
hemorrhage             0.23             -0.67         5

so basically I want a heat map between the IDs that shows (by possibly using PCA coordinate distance) how correlated each ID is.

note: the heat map should look something like this (vs a density heat plot): enter image description here


Solution

  • Here is a possibile solution. Hope it can help you.

    df <- structure(list(ID = structure(c(3L, 5L, 2L, 1L, 6L, 4L), .Label = c("aorta", 
    "carotid", "echocardiography", "hemorrhage", "infarction", "myocardial"
    ), class = "factor"), PCA1 = c(-0.88, -0.18, 1.13, -0.03, -0.72, 
    0.23), PCA2 = c(0.87, 0.57, -0.8, -0.06, -0.02, -0.67), cluster = c(9L, 
    7L, 2L, 5L, 3L, 5L)), .Names = c("ID", "PCA1", "PCA2", "cluster"
    ), class = "data.frame", row.names = c(NA, -6L))
    
    # Define a distance function based on euclidean norm
    # calculated between PCA values of the i-th and j-th items
    dst <- Vectorize(function(i,j,dtset) sqrt(sum((dtset[i,2:3]-dtset[j,2:3])^2)), vectorize.args=c("i","j"))
    
    # Here is the distance between echocardiography and infarction
    dst(1,2,df)
    # [1] 0.7615773
    # This value is given by
    sqrt(sum((df[1,2:3] - df[2,2:3])^2))
    
    # Calculate the distance matrix
    nr <- nrow(df)
    mtx <- outer(1:nr, 1:nr, "dst", dtset=df)
    colnames(mtx) <- rownames(mtx) <- df[,1]
    
    # Plot the heatmap using ggplot2
    library(reshape2)
    library(ggplot2)
    mtx.long <- melt(mtx)
    ggplot(mtx.long, aes(x = Var1, y = Var2, fill = value)) + geom_tile()+xlab("")+ylab("")
    

    enter image description here