Search code examples
r

Calling variable in df within function


I'm getting the error message that "cyl" not found. Pretty sure it has to do with tidy eval, but I can't figure out what else I need to do.

I'm looking to evaluate all values in the cyl column and raise a flag if none are greater than zero.

check_percent_fun = function(df, var){
  
apply(df, MARGIN=2, stopifnot("not greater than 0" =
            {{var}} > 0 )
)
}

check_percent_fun(mtcars, cyl)

Solution

  • Tidy eval is a framework for "non-standard evaluation" used in tidyverse packages like dplyr. While there are instances in base R of "non-standard evaluation" like subset() or lm, they don't specifically use the "tidy eval" framework of NSE, so they would have no idea what to do with {{ }}, for instance.

    Here's a dplyr way to do this (as far as I can discern), which uses rlang's tidy eval:

    library(dplyr)
    check_gt0 <- function(df, var) {
        df |>
          summarize(across({{var}}, ~stopifnot("not greater than 0" = .x > 0)))
    }
    

    With cyl we get no error, but with vs we trigger the stopifnot, since it has zeroes:

    check_gt0(mtcars, cyl)
    # data frame with 0 columns and 1 row
    
    check_gt0(mtcars, vs)
    # Error in `summarize()`:
    #ℹ In argument: `across(vs, ~stopifnot(`not greater than 0` = .x > 0))`.
    #Caused by error in `across()`:
    #! Can't compute column `vs`.
    #Caused by error in `stopifnot()`:
    #! not greater than 0
    #Run `rlang::last_trace()` to see where the error occurred.
    

    Another possible variation using dplyr::pull(), another function that uses tidy eval.

    check_gt0_b <- function(df, var) {
      stopifnot("not greater than 0" = dplyr::pull(df, {{var}}) > 0)
    }