Search code examples
rheatmappheatmap

Using pheatmap to sort data by row annotations?


I'm trying to create a heatmap with columns of test data and rows of individual study participants. The participants can be classified into three distinct groups. I'd like to annotate the plot with the three groups and then cluster the data within each group to understand the differences between them.

I'm new to creating heatmaps, and I can't get the row annotations to work. I'm also not sure how to cluster only within each group once I do get the annotations working. I was thinking that the package "pheatmap.type" would work, but unfortunately, it's not available for R version 4.0.2.

I can't post exact data (confidential) but I've attached and example file and I'll describe what I've done so far and post the code. I have a data frame with the first column as labels that include the participant ID and the group (did this using row.names=1) and then 12 columns with numeric data (no NA's). I then ordered the data by the row names and used the scale function to scale the data and generate a matrix. I then tried to create an annotation row by adding the group info to a data frame in several different ways. What I've tried so far is below:

#dataframe with Group and ID as row names and 12 numerical columns  

df_1_HM <- data.frame(df_1$Group_ID, df_1$Test1, df_1$Test2, df_1$Test3, df_1$Test4, df_1$Test5, df_1$Test6, df_1$Test7, df_1$Test8, df_1$Test9, df_1$Test10, df_1$Test11, df_1$Test12, row.names=1)

#ordering the dataframe so that the groups are in order 
df_1_HM_ordered <- df_1_HM[ order(row.names(df_1_HM)), ]

#Z-scoring (scaling) data 
df_HM_matrix_1 <- scale(df_1_HM)

#creating a color palette 
my_palette <- colorRampPalette(c("white", "grey", "black"))(n = 100)


#Plotting heatmap 
install.packages("gplots")
library(gplots)

#trying to plot the heatmap with annotation_row data 
#The method below does not work for me. The plot will run with no errors but does not actually plot - it ends up becoming a list of 4 with no data.

pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=annotation_row)

annotation_row = data.frame(
  df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)))
)

rownames(annotation_row) = paste("df_Group", 1:28, sep = "")

rownames(annotation_row) = rownames(df_HM_matrix_1) # name matching

#I also tried to use a dataframe with just the groups as column 1 to get row annotation 
pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=df_Group)

df_Group <- data.frame(df_1$Group, df_1$ID)

#Also tried using the select function to create a dataframe for the row annotation 
df_Group_1 <- select(df_1, Group) 

#When I use either of the data frame methods above I get the following error: Error in cut.default(a, breaks = 100) : 'x' must be numeric

Any help with this at all would be awesome!!

Here is the example data:

structure(list(Group_ID = structure(1:28, .Label = c("Group1_10", 
"Group1_13", "Group1_15", "Group1_2", "Group1_20", "Group1_26", 
"Group1_27", "Group1_3", "Group1_6", "Group1_8", "Group2_1", 
"Group2_12", "Group2_14", "Group2_16", "Group2_21", "Group2_23", 
"Group2_25", "Group2_28", "Group2_7", "Group2_9", "Group3_11", 
"Group3_17", "Group3_18", "Group3_19", "Group3_24", "Group3_4", 
"Group3_5", "Group3_6"), class = "factor"), Test1 = c(1.44, 4.36, 
0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 
0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 
1.43, 2.58, 2.49, 2.64), Test2 = c(1.44, 4.36, 0.75, 0.59, 1.67, 
0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 
1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 
2.64), Test3 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 
2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 2.64), Test4 = c(1.44, 
4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 
2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 
1.49, 1.43, 2.58, 2.49, 0.31), Test5 = c(1.44, 4.36, 0.75, 0.59, 
1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 
0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 
2.49, 0.31), Test6 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 
0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 0.31), 
    Test7 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test8 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test9 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test10 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    ), Test11 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    ), Test12 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    )), class = "data.frame", row.names = c(NA, -28L))

Solution

  • For annotations to work with pheatmap, factors must be ordered. To do this, add ordered = TRUE to factor():

    annotation_row = data.frame(df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)), ordered = TRUE))
    

    You could also use as.ordered() to accomplish the same thing.

    To sort your heatmap row by annotation group, just add the argument cluster_rows = F to pheatmap():

    pheatmap(df_HM_matrix_1,
             scale="none",
             color=my_palette,
             fontsize=14, 
             annotation_row=annotation_row,
             cluster_rows = F)
    

    And here is what it looks like now: enter image description here