Search code examples
rdataframedendrogramhclustcomplexheatmap

External dendrogram does not keep the same formation when using it for cluster_rows in complexheatmap


I am trying to create a heatmap with an external dendrogram using the ComplexHeatmap library .

df <- data.frame(genes=c("G1","G2","G3","G4","G5","G6","G7","G8","G9","G10",
                         "G11","G1","G12","G4","G15","G6","G17","G8","G1","G2"),
                 rel=c("A111SD","G422ER","A112SA","B457EE","B33","N124A","F124A",
                       "G900GG","I332LP","I332LO",
                       "M332LP","A322TR","C14SA","B467ET","Z653","R124T","F334A",
                       "G901GZ","R330TP","L982LP"))
df_dist <- stringdist::stringdistmatrix(df$rel,useNames = T)
df_hclust <- hclust(df_dist)                 
plot(df_hclust)

df$Id <- seq.int(nrow(df))
df <- spread(df,genes,Id)
rownames(df) <- df$rel
df$rel<- NULL
df[!is.na(df)] <- 1
df[is.na(df)] <- 0

if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("ComplexHeatmap")
library(ComplexHeatmap)

Heatmap(as.matrix(df),cluster_rows = df_hclust,cluster_columns = F)

The created heatmap has a different dendrogram from the one built using plot().

enter image description here Plot created form the plot command


Solution

  • The problem is that after all the transformations:

    df$Id <- seq.int(nrow(df))
    df <- spread(df,genes,Id)
    rownames(df) <- df$rel
    df$rel<- NULL
    df[!is.na(df)] <- 1
    df[is.na(df)] <- 0
    

    The data.frame df has no longer the initial order. The quick fix would be to add the following line to restore initial order:

    df <- df[rownames(as.matrix(df_dist)),]
    

    However, I suggest you to avoid reassigning to the same variable because it easily leads into these kinds of problems.