Search code examples
rfunctiontidyeval

Function using sym() and deparse(substitute()) not working as expected


I'm trying to build a function that takes two sorts of inputs, either numeric or character, changes them or leaves them as they are given class, then filters a dataframe by those arguments.

library(tidyverse)

fun1 = function(df,filt_col,filt_term_1,filt_term_2){
  
# changing the filt_col to symbol which is need to correctly parse things 
  filt_col = sym(filt_col)  
  
# if statement that checks whether the filtering term is numeric or not
# if it is numeric it leaves as is, whilst if not it deparse(substitutes) (i.e. makes into quoted text)
  if (!is.numeric(filt_term_1)) {filt_term_1 = deparse(substitute(filt_term_1))}
  if (!is.numeric(filt_term_2)) {filt_term_2 = deparse(substitute(filt_term_2))}
  
# doing one of two things depending on filtering terms that have been provided as arguments  
# if numeric, then filter < and > than numbers provided
# if character, then filter == to argument provided
  if(is.numeric(filt_term_1) & is.numeric(filt_term_2)) {
  
    group1 = df %>% filter(!!filt_col < filt_term_1)
    
    group2 = df %>% filter(!!filt_col > filt_term_2)
    
    
  } else {
    
    group1 = df %>% filter(!!filt_col == filt_term_1)
    
    group2 = df %>% filter(!!filt_col == filt_term_2)
    
  }

# put two groups in a list
  grouped_list = list(group1,group2)
  
  return(grouped_list)
  
}



# trying function which runs well with numeric args
fun1(iris,"Sepal.Length",4.9,4.9)

# but does not run with character args
fun1(iris,"Species",versicolor,virginica)

Firstly, I'm not sure what the error is about. Secondly, how can I make this more efficient? Ideally I would want to enter all arguments as non-quoted text.

Thank you.


Solution

  • The problem is the following three lines of conditions when parsing unquoted expressions to filt_term_1 and filt_term_2:

    • if (!is.numeric(filt_term_1))
    • if (!is.numeric(filt_term_2))
    • if(is.numeric(filt_term_1) & is.numeric(filt_term_2))

    If filt_term_* is a numeric or character these expressions can be evaluated as they will be represented as atomic vectors. In the case of an object being passed, like the unquoted versicolor it'll fail: This object does not exist and cannot evaluated outside a context.

    A possible fix of your code:

    We could think of various work arounds, but to avoid an XY problem, in your case, I'd propose to let the type of the variable in the dataset determine how the inputs should be treated. Not the type of input.

    library(tidyverse)
    
    fun1 = function(df, filt_col, filt_term_1, filt_term_2){
      
      # changing the filt_col to symbol which is need to correctly parse things 
      filt_col = sym(filt_col)  
      
      # if statement that checks whether the filtering term is numeric or not
      # if it is numeric it leaves as is, whilst if not it deparse(substitutes) (i.e. makes into quoted text)
      if (!is.numeric(pull(df, {{filt_col}}))) {filt_term_1 = deparse(substitute(filt_term_1))}
      if (!is.numeric(pull(df, {{filt_col}}))) {filt_term_2 = deparse(substitute(filt_term_2))}
      
      # doing one of two things depending on filtering terms that have been provided as arguments  
      # if numeric, then filter < and > than numbers provided
      # if character, then filter == to argument provided
      if(is.numeric(pull(df, {{filt_col}}))) {
        
        group1 = df %>% filter(!!filt_col < filt_term_1)
        
        group2 = df %>% filter(!!filt_col > filt_term_2)
        
        
      } else {
        
        group1 = df %>% filter(!!filt_col == filt_term_1)
        
        group2 = df %>% filter(!!filt_col == filt_term_2)
        
      }
      
      # put two groups in a list
      grouped_list = list(group1,group2)
      
      return(grouped_list)
      
    }
    

    A simpler solution in your spirit:

    You might want to explore the {{ }} syntax that I used above and simplify your code even more. The chunk below will take inputs like: fun1(iris, "Species", versicolor, virginica) and fun1(iris, Species ,versicolor ,virginica). However, you'd want to think carefully of what inputs to accept and why.

    library(tidyverse)
    
    fun1 = function(df, filt_col, filt_term_1, filt_term_2){
      
      if(is.numeric(pull(df, {{filt_col}}))) {
        
        group1 = df %>% filter({{filt_col}} < filt_term_1)
        group2 = df %>% filter({{filt_col}} > filt_term_2)
        
      } else {
        
        filt_term_1 <- deparse(substitute(filt_term_1))
        filt_term_2 <- deparse(substitute(filt_term_2))
        
        # We need the if_any (or similar hack) to accept both quoted and unquoted column names.
        group1 = df %>% filter(if_any({{filt_col}}, ~ . == filt_term_1))
        group2 = df %>% filter(if_any({{filt_col}}, ~ . == filt_term_2))
        
      }
      
      # put two groups in a list
      grouped_list = list(group1,group2)
      
      return(grouped_list)
      
    }
    

    A tidyverse-spirit solution:

    However, as pointed out by @Limey, it would probably be more in line with the spirit of tidyverse to take input columns as objects and values as character/numeric constants: (*)

    fun1(iris, Species, "versicolor", "virginica")

    fun1 <- function(df, filt_col, filt_term_1, filt_term_2) {
      
      if (is.numeric(pull(df, {{filt_col}}))) {
        
        group1 <- filter(df, {{filt_col}} < filt_term_1)
        group2 <- filter(df, {{filt_col}} > filt_term_2)
        
      } else {
        
        group1 <- filter(df, {{filt_col}} == filt_term_1)
        group2 <- filter(df, {{filt_col}} == filt_term_2)
        
      }
      
      list(group1, group2)
      
    }
    

    (*) Also pointed out by G. Grothendieck normally character values are not passed using NSE, only column names.