Search code examples
rggplot2

How to add "variables grid" below ggplot


In biomedical research, it's not uncommon to do an experiment where one varies a bunch of conditions (e.g., genotypes of mice or cell lines, treatment with various drugs), and measures a single major output variable for each experiment (e.g. blood sugar level, expression of a reporter gene).

A common way one plots these is as a series of bar or boxplots (or violin plots, or whatever), arranged horizontally, with a grid underneath depicting the variables that were changed in each condition.

It's easy to make a graph like this in Excel, setting the left-to-right order however you want, but then of course you have to mark it up in Illustrator or the like. It would be much nicer to be able to do this in R, especially for exploratory analysis, where one could hopefully have the markup done automatically. But I can't seem to find a good way to do this.

I've pasted code below, for an example of the kind of data I'm working with and how it is organized, and how I was able to kludge up a horizontal barplot, with labels, by manually specifying an ordering variable that accounts for the absence or presence of two treatment conditions.

I also included a genotype variable, with separates the data into two groups—the only way to get this to plot horizontally was to use facet_wrap. Below, on the left, is the output I could get from native ggplot, and on the right is an Illustrator-edited version that indicates how I would like the figure laid out. Is there a package that can make a graph like this, or a workaround that might be used to put such a graph together "by hand" in R?

(I'm pretty comfortable with ggplot coding.)

ggplot output vs. desired graph

library(tidyverse)

test <- structure(list(replicate = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
                                     2, 3, 3, 3, 3, 3, 3, 3, 3),
                       genotype = c("A", "A", "A", "A",
                                    "B", "B", "B", "B", "A", "A", "A", "A", "B", "B", "B", "B", "A",
                                    "A", "A", "A"),
                       treat1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,
                                  TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE,
                                  FALSE, FALSE, TRUE, TRUE),
                       treat2 = c(FALSE, TRUE, FALSE, TRUE,
                                  FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
                                  FALSE, TRUE, FALSE, TRUE, FALSE, TRUE),
                       output_norm = c(1, 5,
                                       1.75, 4.74, 1, 13.61, 0.7, 7.96, 1, 3, 1.67, 2.51, 1, 6.44, 0.93,
                                       10.92, 1, 3.63, 2.24, 6.59)),
                  row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

# Create ordered factor variable accounting for treat1 vs treat2, for x-axis
condition <- tibble(
  treat1=c(F, T, F, T),
  treat2=c(F, F, T, T),
  treatment=c('untreated', 'treat1', 'treat2', 'treat1 + treat2')) %>%
  mutate(treatment=ordered(treatment, levels=treatment)) %>% print()

# Add treatment variable to test data
test <- left_join(test, condition, by=c('treat1', 'treat2')) %>% print()

# Bar graph with individual data points
ggplot(test, aes(x=treatment, y=output_norm)) +
  geom_bar(stat='summary', fun='mean', position='dodge') +
  geom_jitter(width=0.1, height=0) +
  facet_wrap(~genotype) +
  labs(title='output by genotype', x='Treatment', y='output') +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

UPDATE: The answer below by stefan worked well, and motivated me to create a generalized function to use legendry to create plots like this. The only ugly part was that I have to manually build the guides as a string, to assemble the stack of one guide_axis_base call per variable, and then use eval(parse(text=)) to execute it within ggplot.

grid_plot <- function(df, output, orgvars, 
                      fun='mean', 
                      logical.convert=c('', '+'),
                      width=0.1,
                      theme='bw') {
  # function to plot df as summarized barplot + jittered individual points, with
  #   grid of variables underneath
  # df = dataframe to plot
  # output = name of output variable in df
  # orgvars = ordered vector of variable names for organizing and summarizing data
  # fun = function to summarize data (default = 'mean')
  # logical.convert = vector of strings for replacing FALSE/TRUE (default = c('', '+'))
  # width = width of jitter plot (default = 0.1)
  # theme = theme for plot (default = 'bw')
  
  # return = ggplot object
  
  require(legendry)
  
  # remove any columns other than output and orgvars
  df <- select(df, all_of(c(orgvars, output)))
  
  # convert logical variables to strings
  if(!is.null(logical.convert)) {
    df <- mutate(df, across(where(is.logical), function(x) {
        ifelse(x, logical.convert[2], logical.convert[1])
      }))
    }
  
  # summarize df using orgvars and fun
  df_summ <- group_by(df, across(all_of(orgvars))) %>%
    summarise(output_summ = do.call(fun, list(.data[[output]])), .groups="drop")

  # arrange df_summ by orgvars, add x.axis variable based on sort order
  df_summ <- arrange(df_summ, pick(all_of(orgvars))) %>% 
    mutate(x.axis = 1:nrow(df_summ))
  
  # merge x.axis variable into original df
  df <- left_join(df, select(df_summ, -output_summ), by=orgvars) %>% 
    arrange(pick(all_of(orgvars)))

  # create key_labels dataframe
  key_labels <- select(df_summ, -c(output_summ, x.axis))
  
  # create text for calling stack creation function 
  # (there must be a smarter way to do this but I haven't found it yet)
  st_call <- sapply(seq_along(orgvars), function(i) {
    paste0(
      'guide_axis_base(key = key_manual(aesthetic = 1:', nrow(key_labels), ',\n',
      'label=c("', str_flatten(key_labels[[orgvars[i]]], '", "'), '")),\n',
      'theme=theme(',
      ifelse(i>1, 'axis.ticks.x=element_blank(),\n', ''),
      'axis.text.x = element_text(vjust = 0)))'
    ) %>% return()
  }) 
  
  st_call <- str_flatten(st_call, collapse = ',\n') 

  gu <- paste0('guides(\n x=compose_stack(\n',
               st_call,
               ',\n',
               'side.titles = c("',
               str_flatten(orgvars, collapse='", "'), 
               '"),\n',
               'theme = theme(legendry.axis.subtitle = element_text(vjust = 0))))')
  # uncomment below to see guides text
  # tidy_source(text = gu, args.newline=T)
  
  # create ggplot object
  
  pl <- ggplot(df_summ, aes(x = x.axis, y = output_summ)) +
    # bar plot of summarized values
    geom_bar(stat = "summary", fun = "mean", position = "dodge") +
    # jitter plot of individual values
    geom_jitter(
      data=df, aes(x = x.axis, y = .data[[output]]),
      width = width, height = 0) +
    xlim(0.5, nrow(key_labels) + 0.5) +
    labs(title = NULL, x = NULL, y = output) +
    eval(parse(text=paste0('theme_', theme, '()'))) +
    # Add margin to make room for side titles
    theme(plot.margin = margin(5.5, 5.5, 5.5, 22)) +
    eval(parse(text=gu))
  
  return(pl)
}

Example output:

grid_plot(test, 'output_norm', c('genotype', 'treat2', 'treat1'), theme='gray')

enter image description here


Solution

  • A more recent option would be to use the legendry package which via e.g. compose_stack allows to stack multiple axes and allows for side titles, too. However, it requires some effort to set up the single guide_axis_base objects.

    library(tidyverse)
    library(legendry)
    
    test <- test |>
      mutate(
        treat1 = if_else(treat1, "+", " "),
        treat2 = if_else(treat2, "+", " "),
        x = interaction(treat1, treat2, genotype)
      )
    
    key_labels <- distinct(
      test,
      x, genotype, treat1, treat2
    ) |>
      mutate(x = as.numeric(x))
    
    gab <- function(which, ticks = TRUE) {
      guide_axis_base(
        key = key_manual(
          key_labels$x, key_labels[[which]]
        ),
        theme = theme(
          axis.ticks.x = if (!ticks) element_blank(),
          axis.text.x = element_text(vjust = 0)
        )
      )
    }
    
    ggplot(test, aes(x = x, y = output_norm)) +
      geom_bar(stat = "summary", fun = "mean", position = "dodge") +
      geom_jitter(width = 0.1, height = 0) +
      labs(title = "output by genotype", x = NULL, y = "output") +
      theme_bw() +
      guides(
        x = compose_stack(
          gab("genotype"),
          gab("treat1", ticks = FALSE),
          gab("treat2", ticks = FALSE),
          side.titles = c("genotype", "treat1", "treat2"),
          theme = theme(
            legendry.axis.subtitle = element_text(
              vjust = 0
            )
          )
        )
      ) +
      # Add margin to make room for side titles
      theme(plot.margin = margin(5.5, 5.5, 5.5, 22))
    

    enter image description here