Search code examples
rggplot2tidyeval

How to use external character values to select x and y variables in ggplot?


I have data with multiple observable variables (say, x,y,z), indexed by another set of variables (sect, and item). Each time I run an experiment I get such a set of observations. So for experiment "A" I get a value for each variable x,y,z for each value of the index pair (sect, item). Then I run another experiment "B", and get a whole new set of these variables.

What I want to do is simple: plot the observed values in one experiment against their respective values in another experiment, faceted by variable (so, plot x from A against x from B, and likewise for y, and z). I would like to do this in a "tidy" way, but the only ways I can find seem more complicated than it should be.

Here's some simulated data to illustrate with:

library(tidyr)
library(dplyr)
library(ggplot2)
# Function to simulate an experiment
simdata <- function(experiment_name) {
  n <- 3 # number of sections
  m <- 7 # number of items per section
  tibble(
    # data points (section-item pairs)
    sect = factor(rep(1:n, ea = m)), item = factor(rep(1:m, n)),
    # simulated observed values of three variables
    x = (1:(n * m))^1.05 + rnorm(n * m),
    y = (1:(n * m))^1.15 + rnorm(n * m, sd = 2),
    z = (1:(n * m))^1.25 + rnorm(n * m, sd = 4),
    experiment = experiment_name
  )
}
# Make an example dataset consisting of 
# data from experiments named "A", "B", and "C"
set.seed(42)
d <- bind_rows(simdata("A"), simdata("B"), simdata("C"))

So, d is a dataset with data from three experiments. Here's the first few rows.

r$> d
# A tibble: 63 × 6
   sect  item      x     y      z experiment
   <fct> <fct> <dbl> <dbl>  <dbl> <chr>     
 1 1     1      2.37 -2.56  4.03  A         
 2 1     2      1.51  1.88 -0.528 A         
 3 1     3      3.53  5.97 -1.52  A         
 4 1     4      4.92  8.71  7.39  A         
 5 1     5      5.82  5.50  4.23  A         
# … with 58 more rows

Now, say I want to plot the observations from experiment A against those from experiment B. I'll call these control and alternative:

# a list of two experiment names, to compare
exps <- list(control = "A", alternative = "B")

Now here's the part that seems overcomplicated. The best way I can find of doing what I want to do involves two pivots (which seems ugly). This results in columns for each experiment. And then I wrap the experiment names (with sym()) and immediately unwrap (with !!) in order to refer to these columns by name, as seems necessary for tidy evaluation afaiu.

This works, but is there a better way of doing this?

d_reshaped <- d |>
  ## There must be a better way of doing this reshaping
  pivot_longer(
    cols = -c(experiment, sect, item), 
    names_to = "var", values_to = "value"
  ) |>
  pivot_wider(names_from = c("experiment"), values_from = "value")
d_reshaped |>
  ## But I'm mostly looking for a better way to do this dereferencing...
  ggplot(aes(
    !!sym(exps$control),
    !!sym(exps$alternative)
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste("Experiment", exps, collapse = " vs "))

Experiment A vs Experiment B


I can see that instead of the wrapping/unwrapping !!sym part I could use aes_string(exps$control, exps$alternative) but that is soft deprecated, so I get the warning

Warning message:
`aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation ideoms with `aes()`

and so I suppose I shouldn't use it. Anyway, the main thing I wonder is whether there's a better way of doing the whole thing, since I think I must be overcomplicating this, but can't see how.


Solution

  • I think the way you are doing things is reasonable. It is a moderately complex data wrangling task to go from your existing data layout to the layout you need to plot.

    However, you have only gone halfway in getting the data into the correct format, and that leads to you needing to specify pairs of experiments in an external variable and using the !!sym(var) syntax. Although it takes a bit of effort, I think it is worth wrangling your data into the perfect plotting format:

    plot_df <- combn(unique(d$experiment), 2) |>
      apply(2, \(v) filter(d, experiment %in% v)) |>
      lapply(\(x) split(x, x$experiment)) |>
      lapply(\(x) cbind(
        x[[1]] |> rename_with(~ paste0(.x, 1)),
        x[[2]] |> rename_with(~ paste0(.x, 2))
      )) |>
      bind_rows() |>
      mutate(pair_experiments = paste(experiment1, experiment2, sep = " vs ")) |>
      select(!matches("^(sect|item|experiment)")) |>
      pivot_longer(-pair_experiments,
        names_pattern = "(.)(\\d)",
        names_to = c("var", ".value")
      ) |>
      rename(xvar = `1`, yvar = `2`)
    
    plot_df
    #> # A tibble: 189 x 4
    #>    pair_experiments var     xvar   yvar
    #>    <chr>            <chr>  <dbl>  <dbl>
    #>  1 A vs B           x      2.37   2.40 
    #>  2 A vs B           y     -2.56  -1.39 
    #>  3 A vs B           z      4.03   1.42 
    #>  4 A vs B           x      1.51   1.34 
    #>  5 A vs B           y      1.88   3.44 
    #>  6 A vs B           z     -0.528  0.689
    #>  7 A vs B           x      3.53   4.47 
    #>  8 A vs B           y      5.97   3.10 
    #>  9 A vs B           z     -1.52   3.46 
    #> 10 A vs B           x      4.92   4.62 
    #> # i 179 more rows
    #> # i Use `print(n = ...)` to see more rows
    

    So now you can get all your combinations in a single faceted plot:

    ggplot(plot_df, aes(xvar, yvar)) +
      geom_point(alpha = 0.5) +
      facet_grid(pair_experiments ~ var, switch = "y") +
      coord_fixed() +
      labs(x = NULL, y = NULL)
    

    enter image description here

    And even if you don't want all pairs in one plot, it's trivial to filter to plot any pair you want:

    ggplot(plot_df %>% filter(pair_experiments == "A vs B"), aes(xvar, yvar)) +
      geom_point(alpha = 0.5) +
      facet_grid(. ~ var) +
      coord_fixed() +
      labs(x = "A", y = "B")
    

    enter image description here