Search code examples
rvisualizationdata-analysisheatmap

R - Add a column to a heatmap that is independent of it to visually see correlations between grouping and a response variable


I've got a large data frame of 800 rows and 30 columns and a response vector of length 800. I'm trying to create a heatmap where I'd like to display the heatmap along with the column, side by side, but colored independently (the response variable is on a very different scale, so I don't want to use the same coloring scheme for both). Currently my heatmap looks like this (the vertical lines are added as part of my analysis, they are not part of the heatmap).

enter image description here

But I have a small-scale reproducible example here:

mydf <- as.matrix(data.frame(A = sample(10, 20, replace = T), 
                             B = sample(10, 20, replace = T),
                             C = sample(10, 20, replace = T),
                             D = sample(10, 20, replace = T),
                             E = sample(10, 20, replace = T)))

response <- sample(100, 20, replace = T)
mydf_order <- hclust(dist(mydf))$order
heatmap(mydf[mydf_order,], Rowv = NA, Colv = NA, labRow = as.character(mydf_order))

I'm able to produce a heatmap of the hierarchically clustered mydf, but would like to also display the response column alongside it with an independent coloring scheme, so that I can see if the grouping in mydf corresponds to response

Thank you


Solution

  • One approach is to map your response variable to a color gradient, e. g. using the {scales} package:

    
        library(scales)
        
        response_colors <- colour_ramp(c("blue", "red"))(response / max(response))
        
        ## > head(response_colors)
        ## [1] "#FA001C" "#D2007B" "#DF0062" "#AF00AD" "#E40058" "#4F00F2"
    
    

    Then, use these for the RowSideColors argument (make sure the order corresponds to that of the reordered mydf):

    
        heatmap(mydf[mydf_order,], 
                Rowv = NA,
                Colv = NA, 
                labRow = as.character(mydf_order),
                RowSideColors = response_colors
        )
    
    

    result:

    heatmap with separate color scale for independent response