Search code examples

How to use case_when with grep and names in R function for string filtering?

I'm trying to write a function that takes a dataframe and makes a new column where each of it's row values are conditional on columns that contain the strings "bead" and "coefficient" in their column name.

The function would have the following inputs: -data: The dataframe -analyte: a string that is the name of the analyte (e.g. "protein1" or "protein2")

The dataframe has columns like this

id <- c(1,2)
protein1_bead <- c(50,20)
protein1_coef <- c(20,60)
protein2_bead <- c(50,20)
protein2_coef <- c(20,60)

analyte_df <-, protein1_bead, protein1_coef, protein2_bead, protein2_coef))

Here are the steps of the function:

  1. Takes "data" and selects columns that contain "analyte" in the name. So for example, if I feed the string "protein1", it'll select columns with the names "protein1_bead" and "protein1_coefficient"
  2. Create a new column called "target" through a case_when where if column containing "bead" is < 30 and "coefficient" > 50, it'll give the value "re_run". Otherwise, gives value "NA"
  3. Filter the rows to just observations that have "re_run" in the column "exclude"

The syntax in the end might be something like:

target_function(data = analyte_df, analyte = "protein1")

I want this:

ID protein1_bead protein1_coef target
2 20 60 re_run

This is the function I tried to write:

target_function <- function(data, analyte){
  target_list <- data %>% select(ID, contains(analyte)) %>%
    mutate(target = case_when(grep("bead", names(.)) < 30 & grep("coef", names(.)) > 50 ~ "re_run",
                              TRUE ~ NA_character_)) %>% 
    filter(target == "re_run")

But I get the error:

Error in `mutate()`:
! Problem while computing `target = case_when(...)`.
✖ `target` must be size 2 or 1, not 0.

Which I suspect is because I can't use grep and names() inside a case_when like this. I just haven't found an answer yet. Thanks in advance


  • I think the key here is !!sym(...). You can't use a string for a column name, so you have to do a little non-standard eval magic.

    target_function <- function(data, analyte){
      data |>
        select(starts_with(analyte)) |>
        mutate(target  = case_when(
          !!sym(glue::glue("{analyte}_bead")) < 30 &
            !!sym(glue::glue("{analyte}_coef"))  > 50 ~ "re_run",
          TRUE ~ NA_character_
          )) |>
    target_function(data = analyte_df, analyte = "protein1")
    #>   protein1_bead protein1_coef target
    #> 1            20            60 re_run