Search code examples
rggplot2group-bypipemagrittr

How to use magrittr Tee pipe %T>% to create multiple ggplots for grouped data in R


I'm trying to create histograms per-group then return a summary. Per this answer, I can use {braces} and print to avoid issues in creating one plot then moving onto another, however this doesn't seem to acknowledge grouping:

data(mtcars)
mtcars |> 
  group_by(cyl) %T>%
  {print(ggplot(.) +
           geom_histogram(aes(x = carb)))} |> 
  summarise(meancarb = mean(carb))

The above code works insofar as it creates a single histogram then the summary, however:

mtcars %T>%
  {print(ggplot(.) +
           geom_histogram(aes(x = carb)))} |> 
  group_by(cyl) |> 
  summarise(meancarb = mean(carb))

The above code produces exactly the same output, i.e. confirming that group_by isn't being acknowledged.

Does anyone know why the grouping isn't being used to create 1 histogram per unique cyl? Ideally I'd love to work out how to use Tee pipes to do this kinda thing more often, including saving the output to unique names, before continuing onto more pipe. In general it feels like Tee pipes are underused, possibly relating to the dearth of info about them, so if anyone has any cool examples to share, that might be great for the community.

Thanks!

Edit

Following divibisan's comment about dplyr::group_map (or group_walk):

mtcars |> 
  group_by(cyl) %T>%
  group_walk(.f = ~ ggplot(.) +
              geom_histogram(aes(x = carb))) |> 
  summarise(meancarb = mean(carb, na.rm = TRUE),
            sd3 = sd(carb, na.rm = TRUE) * 3)

This creates the summary table but no plot(s). Output identical for map and walk. Output also the same if I replace %T>% with |>. Ostensibly group_walk is doing the same as %T>%. With |> and group_map, I get:

Error in UseMethod("summarise"): no applicable method for 'summarise' applied to an object of class "list"

mtcars |> 
  group_by(cyl) %T>%
  {print(group_walk(.f = ~ ggplot(.) +
              geom_histogram(aes(x = carb))))} |> 
  summarise(meancarb = mean(carb, na.rm = TRUE),
            sd3 = sd(carb, na.rm = TRUE) * 3)

With print and braces:

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'print': argument ".data" is missing, with no default

Braces no print:

Error in group_map(.data, .f, ..., .keep = .keep): argument ".data" is missing, with no default

Print no braces: same as braces no print.

Edit2

More interesting ideas coming forth, thanks to Ricardo, this:

mtcars |> 
  group_split(cyl) |> 
  map(.f = ~ ggplot(.) +
        geom_histogram(aes(x = carb)))

Works insofar as it produces 1 plot per group. Success! But: I can't find any combination of Tee/pipes which Tees off mtcars for the group_split AND map, and then resumes the main pipe line:

mtcars %T>% 
  group_split(cyl) %T>%
  map(.f = ~ ggplot(.) +
               geom_histogram(aes(x = carb))) |>
  summarise(meancarb = mean(carb))

Error in map(): In index: 1. With name: mpg. Caused by error in fortify(): data must be a <data.frame>, or an object coercible by fortify(), not a double vector.

Also anything other than 2 pipes means the plots aren't created.

Trying this another way around, by reordering the pipe structure (which won't always be possible/desirable):

mtcars |>
  group_by(cyl) %T>%
  summarise(meancarb = mean(carb)) |> 
  ungroup() |> 
  group_split(cyl) |> 
  map(.f = ~ ggplot(.) +
        geom_histogram(aes(x = carb)))

This creates the 3 plots but doesn't print the summary. Any combination of {braces} and/or print around the summary line gives:

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'mean': object 'carb' not found.

Does anyone know whether the Tee pipe is explicitly for a single command, i.e. you can't pipe another command onto the tee branch, and then return to the main pipe? Thanks all

Edit 3

Thanks zephyr. Followup question: how to do multi-command tee pipes without a formula-format first command?

mtcars |>
  summarise(sdd = sd(carb, na.rm = TRUE))

Works fine, prints a single value.

mtcars %T>%
  summarise(sdd = sd(carb, na.rm = TRUE)) |> 
  summarise(
    meancarb = mean(carb, na.rm = TRUE),
    sd3 = sd(carb, na.rm = TRUE) * 3
  )

Doesn't print the value, performs the calculation invisibly then continues. Any combination of print and {braces} I've tried results in:

Error: function '{' not supported in RHS call of a pipe

or

Error in is.data.frame(x) : object 'carb' not found

Say I wanted, e.g.:

mtcars  |> 
  summarise(~{
    print(sdd = sd(carb))
    write_csv(file = "tmp.csv")
    .x
  }) |> 
  summarise(meancarb = mean(carb))

Any thoughts? Thanks again!


Solution

  • You were on the right track with group_walk(), but you need to put the print() inside the mapped function:

    library(dplyr)
    library(purrr)
    library(magrittr)
    library(ggplot2)
    
    mtcars |> 
      group_by(cyl) %T>%
      group_walk(~ print(
        ggplot(.) + geom_histogram(aes(x = carb))
      )) |> 
      summarise(
        meancarb = mean(carb, na.rm = TRUE),
        sd3 = sd(carb, na.rm = TRUE) * 3
      )
    
    # A tibble: 3 × 3
        cyl meancarb   sd3
      <dbl>    <dbl> <dbl>
    1     4     1.55  1.57
    2     6     3.43  5.44
    3     8     3.5   4.
    

    Note you can get the same result without using %T>% by assigning the plot to a name in your anonymous function and returning the original dataframe after printing:

    mtcars |> 
      group_by(cyl) |>
      group_walk(~ {
        p <- ggplot(.x) + geom_histogram(aes(x = carb))
        print(p)
        .x
      }) |> 
      summarise(
        meancarb = mean(carb, na.rm = TRUE),
        sd3 = sd(carb, na.rm = TRUE) * 3
      )