Search code examples
rdataframetidyversedata-manipulation

Passing column names as strings do not work with filter() and only with filter()


Given the following dataframe:

df <- data.frame(a=c(NA,1,2), b=c(3,4,5))

I can pass column name as string in select:

> df %>% select("a")
   a
1 NA
2  1
3  2

Or I can use symbolic names with select. That's fine too:

> df %>% select(a)
   a
1 NA
2  1
3  2

pull accepts both as well:

> df %>% pull("a")
[1] NA  1  2
> df %>% pull(a)
[1] NA  1  2

But I cannot use strings with filter:

> df %>% filter("a"==1)
[1] a b
<0 rows> (or 0-length row.names)

only symbolic names:

> df %>% filter(a==1)
  a b
1 1 4

Why it works with select but not with filter?

Shouldn't it be consistent?


Solution

  • "a" is an argument to select and pull but is not an argument to filter so the situations are not the same. Also the code shown here which returns rows for which column a equals the letter "a" would no longer work if it were allowed to interpret "a" as column a.

    data.frame(a = letters) %>% filter( a == "a" )
    ##   a
    ## 1 a
    

    1) dplyr provides if_any and if_all

    library(dplyr)
    
    df %>%
      filter(if_any("a") == 1)
    ##   a b
    ## 1 1 4
    

    2) Although filter_at has been superseded by the syntax in (1), superseded is not the same as deprecated and it will continue to be available so it is ok to use it though not preferred by the dplyr developers.

    df %>%
      filter_at("a", all_vars(. == 1))
    ##   a b
    ## 1 1 4
    

    Also note that this used to work and actually still does with a warning but in the future it will not work at all as it has been deprecated so do not use:

    # deprecated - do not use
    df %>%
      filter(across("a", ~ . == 1))
    

    Note

    Input from question

    df <- data.frame(a = c(NA, 1, 2), b = c(3, 4, 5))