Search code examples
rdplyrtidyversetibbleplotmath

creating a new column containing plotmath expression using `dplyr::case_when`


I would like to create a new column containing plotmath expressions that I later plan to use somewhere else in the analysis pipeline.

Here is a minimal example along with I tried. For example, here I am trying to create a new column called label that will have a different plotmath expression depending on the value of y column.

This doesn't seem to work:

# loading needed libraries
library(tidyverse)

# creating a dataframe
df <- data.frame(x = c(1:10), y = c(rep("a", 5), rep("b", 5))) %>%
  tibble::as_data_frame(x = .)

# adding a new column with plotmath expression
df %>%
  dplyr::mutate(.data = .,
                label = dplyr::case_when(
                  y == "a" ~ paste(list(
                  "'This is'", "~alpha==", 1
                ), sep = ""),
                y == "b" ~ paste(list(
                  "'This is'", "~beta==", 2
                ), sep = "")))
#> Error in mutate_impl(.data, dots): Evaluation error: `y == "a" ~ paste(list("'This is'", "~alpha==", 1), sep = "")`, `y == "b" ~ paste(list("'This is'", "~beta==", 2), sep = "")` must be length 10 or one, not 3.

Created on 2018-06-26 by the reprex package (v0.2.0).


Solution

  • The error message is showing that each case is returning length 3. This is because when you paste() a list using sep, you'll get a vector of the same length as the list, so

    paste(list(
                  "'This is'", "~alpha==", 1
                ), sep = "")
    

    returns a vector of length 3, not 1 or 10 as required. If instead, you use the collapse argument of paste(), you'll get a vector of length 1. In context:

    df %>%
      dplyr::mutate(.data = .,
                label = dplyr::case_when(
                  y == "a" ~ paste(list(
                    "'This is'", "~alpha==", 1
                  ), collapse = ""),
                  y == "b" ~ paste(list(
                    "'This is'", "~beta==", 2
                  ), collapse = "")))
    # A tibble: 10 x 3
    #       x y     label             
    #   <int> <fct> <chr>             
    # 1     1 a     'This is'~alpha==1
    # 2     2 a     'This is'~alpha==1
    # 3     3 a     'This is'~alpha==1
    # 4     4 a     'This is'~alpha==1
    # 5     5 a     'This is'~alpha==1
    # 6     6 b     'This is'~beta==2 
    # 7     7 b     'This is'~beta==2 
    # 8     8 b     'This is'~beta==2 
    # 9     9 b     'This is'~beta==2 
    #10    10 b     'This is'~beta==2