I've looked at other answers but cannot find a solution for the code below to work. Basically, I'm creating a function that inner_join
the two data frame and filter
based on a column inputted in the function.
The problem is that the filter
part of the function doesn't work. However it works if I take filter off the function and append it like mydiff("a") %>% filter(a.x != a.y)
Any suggestion is helpful.
Note that I am function input in quotes
library(dplyr)
# fake data
df1<- tibble(id = seq(4,19,2),
a = c("a","b","c","d","e","f","g","h"),
b = c(rep("foo",3), rep("bar",5)))
df2<- tibble(id = seq(10, 20, 1),
a = c("d","a", "e","f","k","m","g","i","h", "a", "b"),
b = c(rep("bar", 7), rep("foo",4)))
# What I am trying to do
dplyr::inner_join(df1, df2, by = "id") %>% select(id, b.x, b.y) %>% filter(b.x!=b.y)
#> # A tibble: 1 x 3
#> id b.x b.y
#> <dbl> <chr> <chr>
#> 1 18 bar foo
# creating a function so that I can filter by difference in column if I have more columns
mydiff <- function(filteron, df_1 = df1, df_2 = df2){
require(dplyr, warn.conflicts = F)
col_1 = paste0(quo_name(filteron), "x")
col_2 = paste0(quo_name(filteron), "y")
my_df<- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
my_df %>% select(id, col_1, col_2) %>% filter(col_1 != col_2)
}
# the filter part is not working as expected.
# There is no difference whether i pipe filter or leave it out
mydiff("a")
#> # A tibble: 5 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 10 d d
#> 2 12 e e
#> 3 14 f k
#> 4 16 g g
#> 5 18 h h
The reason it did not work in your original function was that col_1
was string
but dplyr::filter()
expected "unquoted" input variable for the LHS. Thus, you need to first convert col_1
to variable using sym()
then unquote it inside filter
using !!
(bang bang).
rlang
has really nice function qq_show
to show what actually happens with quoting/unquoting (see the output below)
See also this similar question
library(rlang)
library(dplyr)
# creating a function that can take either string or symbol as input
mydiff <- function(filteron, df_1 = df1, df_2 = df2) {
col_1 <- paste0(quo_name(enquo(filteron)), "x")
col_2 <- paste0(quo_name(enquo(filteron)), "y")
my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
cat('\nwithout sym and unquote\n')
qq_show(col_1 != col_2)
cat('\nwith sym and unquote\n')
qq_show(!!sym(col_1) != !!sym(col_2))
cat('\n')
my_df %>%
select(id, col_1, col_2) %>%
filter(!!sym(col_1) != !!sym(col_2))
}
### testing: filteron as a string
mydiff("a")
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k
### testing: filteron as a symbol
mydiff(a)
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k
Created on 2018-09-28 by the reprex package (v0.2.1.9000)