Search code examples
rdplyrlazy-evaluationnse

Using select-like mechanism to select variables for distinct call in dplyr


Desired results

Using simple syntax I filter on vs and am columns leaving also the cyl values.

data(mtcars)
dta <- mtcars[,c("vs", "am", "cyl")]
# Desired results
dta %>% distinct(vs, am, .keep_all = TRUE)

Desired syntax

I would like to reverse the syntax above and select distinct observations on all values excluding the cyl column, corresponding to the example below:

dta %>% distinct(vars(-contains("cyl")), .keep_all = TRUE)

that naturally does not work:

>> dta %>% distinct(vars(-contains("cyl")), .keep_all = TRUE)
   vs am cyl vars(-contains("cyl"))
1   0  1   6      ~-contains("cyl")
2   0  1   6      ~-contains("cyl")
3   1  1   4      ~-contains("cyl")
4   1  0   6      ~-contains("cyl")
5   0  0   8      ~-contains("cyl")
6   1  0   6      ~-contains("cyl")
7   0  0   8      ~-contains("cyl")

Solution

  • If you don't mind not using distinct, then you can use group_by_at together with slice to get your desired result,i.e.

    library(dplyr)
    
    dta %>% 
     group_by_at(vars(-cyl)) %>% 
     slice(1L)
    
    # A tibble: 4 x 3
    # Groups:   vs, am [4]
    #     vs    am   cyl
    #  <dbl> <dbl> <dbl>
    #1     0     0     8
    #2     0     1     6
    #3     1     0     6
    #4     1     1     4