Search code examples
rfilterdplyrrlangtidyeval

Create R function using dplyr::filter problem


I've looked at other answers but cannot find a solution for the code below to work. Basically, I'm creating a function that inner_join the two data frame and filter based on a column inputted in the function.

The problem is that the filter part of the function doesn't work. However it works if I take filter off the function and append it like mydiff("a") %>% filter(a.x != a.y)

Any suggestion is helpful.

Note that I am function input in quotes

library(dplyr)

# fake data
df1<- tibble(id = seq(4,19,2), 
             a = c("a","b","c","d","e","f","g","h"), 
             b = c(rep("foo",3), rep("bar",5)))
df2<- tibble(id = seq(10, 20, 1), 
             a = c("d","a", "e","f","k","m","g","i","h", "a", "b"),
             b = c(rep("bar", 7), rep("foo",4)))

# What I am trying to do
dplyr::inner_join(df1, df2, by = "id") %>% select(id, b.x, b.y) %>% filter(b.x!=b.y)

#> # A tibble: 1 x 3
#>      id b.x   b.y  
#>   <dbl> <chr> <chr>
#> 1    18 bar   foo

# creating a function so that I can filter by difference in column if I have more columns
mydiff <- function(filteron, df_1 = df1, df_2 = df2){
  require(dplyr, warn.conflicts = F)
  col_1 = paste0(quo_name(filteron), "x")
  col_2 = paste0(quo_name(filteron), "y")
  my_df<- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
  my_df %>% select(id, col_1, col_2) %>% filter(col_1 != col_2)
}

# the filter part is not working as expected. 
# There is no difference whether i pipe filter or leave it out
mydiff("a")

#> # A tibble: 5 x 3
#>      id ax    ay   
#>   <dbl> <chr> <chr>
#> 1    10 d     d    
#> 2    12 e     e    
#> 3    14 f     k    
#> 4    16 g     g    
#> 5    18 h     h

Solution

  • The reason it did not work in your original function was that col_1 was string but dplyr::filter() expected "unquoted" input variable for the LHS. Thus, you need to first convert col_1 to variable using sym() then unquote it inside filter using !! (bang bang).

    rlang has really nice function qq_show to show what actually happens with quoting/unquoting (see the output below)

    See also this similar question

    library(rlang)
    library(dplyr)
    
    # creating a function that can take either string or symbol as input
    mydiff <- function(filteron, df_1 = df1, df_2 = df2) {
    
      col_1 <- paste0(quo_name(enquo(filteron)), "x")
      col_2 <- paste0(quo_name(enquo(filteron)), "y")
    
      my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
    
      cat('\nwithout sym and unquote\n')
      qq_show(col_1 != col_2)
    
      cat('\nwith sym and unquote\n')
      qq_show(!!sym(col_1) != !!sym(col_2))
      cat('\n')
    
      my_df %>% 
        select(id, col_1, col_2) %>% 
        filter(!!sym(col_1) != !!sym(col_2))
    }
    
    ### testing: filteron as a string
    mydiff("a")
    #> 
    #> without sym and unquote
    #> col_1 != col_2
    #> 
    #> with sym and unquote
    #> ax != ay
    #> 
    #> # A tibble: 1 x 3
    #>      id ax    ay   
    #>   <dbl> <chr> <chr>
    #> 1    14 f     k
    
    ### testing: filteron as a symbol
    mydiff(a)
    #> 
    #> without sym and unquote
    #> col_1 != col_2
    #> 
    #> with sym and unquote
    #> ax != ay
    #>  
    #> # A tibble: 1 x 3
    #>      id ax    ay   
    #>   <dbl> <chr> <chr>
    #> 1    14 f     k
    

    Created on 2018-09-28 by the reprex package (v0.2.1.9000)