Search code examples
rggplot2treemap

Plotting a better treemap using ggplot and the geom_rect hack, but how to get the labels correct?


I have found it challenging to create a treemap using ggplot, and this blog example captured the issues very well and also provided a nice work around. The work around takes the output from the tree map package to create a ggplot version with geom_rect.

My problem and question is how to adjust the labels and, if I wanted to, colors by hierarchy as I have more groups than the linked example and have different labeling requirements.

Here is a reproducible simple example:

library(tidyverse)
library(treemap)


# Create dummy data
tree_data <- data.frame(
  my_segment = c(
    rep("seg_a", 5),
    rep("seg_b", 6),
    rep("seg_c", 7)),
  my_class = c(
    rep("class_1", 2),
    rep("class_2", 2),
    rep("class_3", 1),
    rep("class_4", 2),
    rep("class_5", 2),
    rep("class_6", 2),
    rep("class_7", 1),
    rep("class_8", 3),
    rep("class_9", 3)),
  my_type = c(
    rep("type_1", 7),
    rep("type_2", 6),
    rep("type_3", 5)),
  vals = round(runif(18, min = 20, max = 100), 0)
)

Here is the head of the sample dataframe:

   my_segment my_class my_type vals
1       seg_a  class_1  type_1   86
2       seg_a  class_1  type_1   41
3       seg_a  class_2  type_1   23
4       seg_a  class_2  type_1   79
5       seg_a  class_3  type_1   33
6       seg_b  class_4  type_1   82
7       seg_b  class_4  type_1   85
8       seg_b  class_5  type_2   40
9       seg_b  class_5  type_2   83
10      seg_b  class_6  type_2   69
11      seg_b  class_6  type_2   98
12      seg_c  class_7  type_2   91
13      seg_c  class_8  type_2   33

The tree map package runs fine, but produces unreadable output in RStudio, and I'd like to be able to customize more with ggplot (similar to the linked article)

# Run treemap function
tree_p <- treemap(
  tree_data,
  index            = c("my_segment", "my_class", "my_type"),
  vColor           = "my_segment",
  vSize            = "vals",
  type             = "index",
  fontsize.labels  = c(15, 12, 10),
  fontcolor.labels = c("white", "orange", "green"),
  fontface.labels  = c(2, 1, 1),
  bg.labels        = 0,
  align.labels     = list(
    c("center", "center"),
    c("right", "bottom"),
    c("left", "bottom")
  ),
  overlap.labels   = 0.5,
  inflate.labels   = FALSE
)
    
# Note:  unreadable output in Rstudio (too small)

treemap_output

Using the workaround in this blog, but adding an additional hierarchy and wanted to change the labeling is where the problem comes in.

# Create the plot in ggplot using geom_rect

# Get underlying data created from running treemap
tm_plot_data <- tree_p$tm %>% 
  mutate(x1 = x0 + w,
         y1 = y0 + h) %>% 
  mutate(x = (x0+x1)/2,
         y = (y0+y1)/2) %>% 
  mutate(
    primary_group = case_when(
      level == 1 ~ 1.5,
      level == 2 ~ 0.75,
      TRUE       ~ 0.5
    )
  ) 



# Plot
ggplot(tm_plot_data, aes(xmin = x0, ymin = y0, xmax = x1, ymax = y1)) + 
  # add fill and borders for groups and subgroups
  geom_rect(aes(fill = color, size = primary_group),
            show.legend = FALSE,
            color       = "black",
            alpha       = 0.3
  ) +
  scale_fill_identity() +
  # set thicker lines for group borders
  scale_size(range = range(tm_plot_data$primary_group)) +
  # add labels
  ggfittext::geom_fit_text(aes(label = my_segment), color = "white", min.size = 1) +
  ggfittext::geom_fit_text(aes(label = my_class), color = "blue", min.size = 1) +
  ggfittext::geom_fit_text(aes(label = my_type), color = "red", min.size = 1) +
  # options
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  theme_void()

treemap_hack

So the question I have is there a way to create the labeling like treemap? Specifically, seg_a, seg_b, and seg_c should only appear once, centered over the area of their respective segments. I'd also like to move the labels so that they do not overlap

Thanks for any help and suggestions!


Solution

  • The issue is that you use your full dataset tm_plot_data to add the labels. Hence, for each upper level you you get multiple labels. To solve this issue aggregate your datasets and pass these datasets as data to ggfittext::geom_fit_text. To deal with overlapping labels you could e.g. use the place argument of ggfittext::geom_fit_text to move the class labels to the bottom left and the type labels to the topright.

    library(tidyverse)
    library(treemap)
    
    set.seed(123)
    
    tm_seg <- tm_plot_data %>% 
      group_by(my_segment) %>% 
      summarise(x0 = min(x0), y0 = min(y0), y1 = max(y1), x1 = max(x1)) %>% 
      ungroup()
    
    tm_class <- tm_plot_data %>% 
      group_by(my_segment, my_class) %>% 
      summarise(x0 = min(x0), y0 = min(y0), y1 = max(y1), x1 = max(x1)) %>% 
      ungroup()
    
    tm_type <- tm_plot_data %>% 
      group_by(my_segment, my_class, my_type) %>% 
      summarise(x0 = min(x0), y0 = min(y0), y1 = max(y1), x1 = max(x1)) %>% 
      ungroup()
    
    # Plot
    ggplot(tm_plot_data, aes(xmin = x0, ymin = y0, xmax = x1, ymax = y1)) +
      # add fill and borders for groups and subgroups
      geom_rect(aes(fill = color, size = primary_group),
        show.legend = FALSE,
        color       = "black",
        alpha       = 0.3
      ) +
      scale_fill_identity() +
      # set thicker lines for group borders
      scale_size(range = range(tm_plot_data$primary_group)) +
      # add labels
      ggfittext::geom_fit_text(data = tm_seg, aes(label = my_segment), color = "white", min.size = 4) +
      ggfittext::geom_fit_text(data = tm_class, aes(label = my_class), color = "blue", min.size = 1, place = "bottomleft") +
      ggfittext::geom_fit_text(data = tm_type, aes(label = my_type), color = "red", min.size = 1, place = "topright") +
      # options
      scale_x_continuous(expand = c(0, 0)) +
      scale_y_continuous(expand = c(0, 0)) +
      theme_void()
    #> Warning: Removed 3 rows containing missing values (geom_fit_text).
    #> Warning: Removed 12 rows containing missing values (geom_fit_text).