Search code examples
rggplot2colorsscatter-plotdendrogram

How do I apply same color palette to dendrogram and scatterplot in ggplot2?


I am making a dendrogram plot and a scatterplot from the same data, both plotted with ggplot2. I use as.dendrogram() to convert the hclust() object into a form suitable for ggplot(), then set branch and label colors using the set() function from dendextend. I use metaMDS() and scores() from vegan for the MNDS results. I color the points in the NMDS scatterplot based on groups created by dendextend::cutree(). I use the same color palette to color the branches and labels in dendrogram.

However, the colors of the groups do not match between the two plots. I assumed because I applied the same colors to the dendrogram and scatterplot, and used the groups from cutree to inform the scatterplot that I would get the same color groups.

How can I ensure that both plots use the same colors for the same groups?

Notice in the images below that Milan, Barcelona, and Marseille (for one example) are one color in the dendrogram and another color in the scatterplot. In the dendrogram, Barcelona is assigned the fourth color from the palette but it is assigned to group 2 from cutree and thus gets the second color from the palette.

Dendrogram

NMDS scatterplot

MWE

library(RColorBrewer)
library(dendextend)
library(dplyr)
library(ggplot2)
library(vegan)

kay <- 5
mycolors <- brewer.pal(kay, "Dark2")

euro.hc <- hclust(eurodist)
euro.cut <- dendextend::cutree(euro.hc, k = kay)

euro.nmds <- metaMDS(eurodist, k = kay)

euro.df <- scores(euro.nmds, display = "sites", tidy = TRUE) %>%
  mutate(grp = as.factor(euro.cut))

dend <- as.dendrogram(euro.hc) %>%
  set("branches_k_color",
    value = mycolors,
    k = kay
  ) %>%
  set("labels_colors",
    value = mycolors,
    k = kay
  ) %>%
  set("branches_lwd", 1.0) %>%
  set("labels_cex", 1)


dend %>% ggplot(horiz = TRUE) +
  scale_x_continuous(expand = c(-1, -1)) +
  scale_y_reverse(expand = c(1, 1)) +
  theme(
    axis.title = element_blank(),
    axis.text = element_blank(),
    axis.ticks = element_blank()
  )

euro.df %>%
  ggplot() +
  geom_point(aes(x = NMDS1, y = NMDS2, color = grp)) +
  geom_text(aes(x = NMDS1, y = NMDS2, label = label, color = grp),
    vjust = -1,
    hjust = .50
  ) +
  scale_colour_manual(values = mycolors, guide = NULL) +
  coord_equal() +
  theme_minimal() +
  theme(
    line = element_blank(),
    axis.text = element_blank()
  )

Created on 2023-09-25 with reprex v2.0.2


Solution

  • After much trial and error, I finally figured out a solution. It works well but seems "hacky" so I would like to learn if there is a better/more efficient method.

    Here's a summary of the changes. The full working code and output are included below.

    1. Set order_clusters_as_data = FALSE in cutree()
    2. Get the names from cutree() and the branch and label colors via get_leaves_branches_col
    3. Create a temporary dataframe from these two vectors.
    4. Make the main dataframe after the making the dendrogram.
    5. Left join the temporary dataframe to my main data frame.
    6. Move color outside of aes() and take it from the colr column of the dataframe.
    library(RColorBrewer)
    library(dendextend)
    library(dplyr)
    library(ggplot2)
    library(vegan)
    
    kay <- 5
    mycolors <- brewer.pal(kay, "Dark2")
    
    euro.hc <- hclust(eurodist)
    euro.cut <- dendextend::cutree(euro.hc, k = kay, order_clusters_as_data = FALSE)
    
    euro.nmds <- metaMDS(eurodist, k = kay)
    
    dend <- as.dendrogram(euro.hc) %>%
      set("branches_k_color",
        value = mycolors,
        k = kay
      ) %>%
      set("labels_colors",
        value = mycolors,
        k = kay
      ) %>%
      set("branches_lwd", 1.0) %>%
      set("labels_cex", 1)
    
    label <- names(euro.cut)
    colr <- get_leaves_branches_col(dend)
    tmp.df <- data.frame(label = label, colr = colr) #order = order
    euro.df <- scores(euro.nmds, display = "sites", tidy = TRUE) %>%
      left_join(x = ., y = tmp.df, by = "label")
    
    
    dend %>% ggplot(horiz = TRUE) +
      scale_x_continuous(expand = c(-1, -1)) +
      scale_y_reverse(expand = c(1, 1)) +
      theme(
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank()
      )
    
    euro.df %>%
      ggplot() +
      geom_point(aes(x = NMDS1, y = NMDS2), 
                 color = euro.df$colr) +
      geom_text(aes(x = NMDS1, y = NMDS2, label = label),
        color = euro.df$colr,
        vjust = -1,
        hjust = .50
      ) +
      scale_colour_manual(values = mycolors, guide = NULL) +
      coord_equal() +
      theme_minimal() +
      theme(
        line = element_blank(),
        axis.text = element_blank()
      )
    

    Created on 2023-09-26 with reprex v2.0.2