Search code examples
rggplot2nestedboxplotggh4x

Problem with reordering a nested x-axis with ggh4x package


I am using the package ggh4x and the following set and code to create a boxplot with a nested relation between two categorical variables.

Data used

set1 <- structure(list(Tx = c("Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", 
"Not Exposed", "Not Exposed", "Exposed", "Exposed", "Exposed", "Exposed", "Exposed", 
"Exposed", "Exposed", "Exposed", "Exposed", "Exposed", "Not Exposed", "Not Exposed", 
"Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Exposed", "Exposed", 
"Exposed", "Exposed", "Exposed", "Exposed", "Exposed", "Exposed", 
"Exposed", "Exposed"), Species = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), levels = c("Species1", "Species2"), class = "factor"), Size = c(88.5, 
83.3, 59.5, 78, 50.3, 57, 78.2, 59, 85, 59.5, 13.1, 50.1, 55, 
60.1, 13.8, 27, 57.1, 53.1, 42, 16, 88.8, 26.2, 62, 108.5, 92.3, 
74.4, 77.3, 96, 88.7, 77.8, 50.7, 61.9, 65.1, 63.5, 64, 88.6, 
53.8, 82.1, 78.8, 75.6)), row.names = c(NA, -40L), class = c("tbl_df", 
"tbl", "data.frame"))

Nested boxplot

library(ggplot2)
library(ggh4x)

ggplot(set1, aes(x=interaction(Tx, Species), y=Size)) +
  stat_boxplot(geom="errorbar", width = 0.15) +
  geom_boxplot(show.legend=FALSE, outlier.shape = NA, aes(fill = interaction(Tx, Species))) +               
  geom_jitter(width = 0.1, shape=21, colour="black", fill="grey95", stroke=0.5, size=1) +
  guides(x="axis_nested") +
  theme_classic() +   
  theme(axis.title = element_text(face="bold"), 
        text = element_text(family = "serif", size = 12.5))

enter image description here Right now, the nested relation is displayed on the x-axis just like I wanted. However, the order of the groups is alphabetically, and I'd like to select it myself (with the "Not Exposed" group before "Exposed").

I tried doing it with weave_factors() instead of interaction(), but then the plot doesn't display the nested relation correctly.

Is there an existing method to selectively reorder the groups ? enter image description here


Solution

  • In 99.9% of questions related to the (re-)ordering of axes, facets or legends the answer is always the same:

    Convert your variable()s to factor(s) with the order of the levels set according to the desired order.

    While there is an option to achieve your desired result using weave_factors, it depends on the order of the data (and some additional changes, see below), and hence I think the more robust approach to make Not Exposed the first category is to use

    set1$Tx <- factor(set1$Tx, levels = c("Not Exposed", "Exposed"))
    

    or relevel as in the answer by @StephanLaurent or depending on the desired order one of the several convenience functions in the forcats package.

    However, when doing so you have to use interaction to get the desired nested axis (as in all examples in the docs, see ?guide_axis_nested).

    library(ggplot2)
    library(ggh4x)
    
    set1$Tx <- factor(set1$Tx, levels = c("Not Exposed", "Exposed"))
    
    ggplot(set1, aes(x = interaction(Tx, Species), y = Size)) +
      stat_boxplot(geom = "errorbar", width = 0.15) +
      geom_boxplot(
        show.legend = FALSE, outlier.shape = NA,
        aes(fill = interaction(Tx, Species))
      ) +
      geom_jitter(
        width = 0.1, shape = 21, colour = "black",
        fill = "grey95", stroke = 0.5, size = 1
      ) +
      guides(x = "axis_nested") +
      theme_classic() +
      theme(
        axis.title = element_text(face = "bold"),
        text = element_text(family = "serif", size = 12.5)
      )
    

    However, for your (example) data and accounting for how weave_factors works and differs from interaction (see below) you could actually achieve your desired result without converting to a factor by switching the order in which you pass Species and Tx to weave_factors and by using the more verbose guide_axis_nested() with inv=TRUE:

    library(ggplot2)
    library(ggh4x)
    
    # Just to ensure that Tx is a non-factor 
    set1$Tx <- as.character(set1$Tx)
    
    ggplot(set1, aes(x = weave_factors(Species, Tx), y = Size)) +
      stat_boxplot(geom = "errorbar", width = 0.15) +
      geom_boxplot(
        show.legend = FALSE, outlier.shape = NA,
        aes(fill = weave_factors(Species, Tx))
      ) +
      geom_jitter(
        width = 0.1, shape = 21, colour = "black",
        fill = "grey95", stroke = 0.5, size = 1
      ) +
      guides(x = guide_axis_nested(inv = TRUE)) +
      theme_classic() +
      theme(
        axis.title = element_text(face = "bold"),
        text = element_text(family = "serif", size = 12.5)
      )
    

    weave_factors vs. interaction:

    weave_factors differs from interaction in two respects (see ? weave_factors:

    1. it orders the new levels such that the levels of the first input variable is given priority over the second input.

    2. it treats non-factor inputs as if their levels were unique(as.character(x)), i.e. the levels are set in the order as in the data (similar to what forcats::fct_inorder does)

    For that reason weave_factors gives IMHO a more natural ordering of the levels of the combined factors

    weave_factors(set1$Tx, set1$Species) |> levels()
    #> [1] "Not Exposed.Species1" "Not Exposed.Species2" "Exposed.Species1"    
    #> [4] "Exposed.Species2"
    

    i.e. the levels of combined factor are "ordered" first by the first input, then the second whereas with interaction it's the other way around:

    interaction(set1$Tx, set1$Species) |> levels()
    #> [1] "Not Exposed.Species1" "Exposed.Species1"     "Not Exposed.Species2"
    #> [4] "Exposed.Species2"