Search code examples
rdplyrtidyverserlangtidyeval

Arrange function in dplyr 0.7.1


I am trying to use the new quo functionality while writing a function utilizing dplyr and ran into the following issue:

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 3, 1),
  a = sample(5), 
  b = sample(5)
)

To arrange the dataframe by a variable is straightforward:

my_arrange <- function(df, arrange_var) {
  quo_arrange_var <- enquo(arrange_var)
  df %>%
    arrange(!!quo_arrange_var)
}

But what if I want to set a preferential order? For example, any arrange variable has 2 as the top variable and then sorts normally. With the previous version of dplyr I would use:

arrange(-(arrange_var == 2), arrange_var)

but in the new structure I am not sure how to approach. I have tried:

my_arrange <- function(df, arrange_var) {
  quo_arrange_var <- enquo(arrange_var)

  df %>%
    arrange(-!!quo_arrange_var==2, !!quo_arrange_var)
}

but I get the error

 Error in arrange_impl(.data, dots) : 
  incorrect size (1) at position 1, expecting : 5 

I have also tried using the quo_name:

my_arrange <- function(df, arrange_var) {
  quo_arrange_var <- enquo(arrange_var)

  df %>%
    arrange(-!!(paste0(quo_name(quo_arrange_var), "==2")), !!quo_arrange_var)
}

but get this error:

 Error in arrange_impl(.data, dots) : 
  Evaluation error: invalid argument to unary operator. 

any help would be appreciated


Solution

  • The easiest fix is to put parenthesis around the bang-bang. This has to do with operator precedence with respect to ! and ==. When you have !!a==b, it gets parsed as !!(a==b) even though you want (!!a)==b. And for some reason you can compare a quosure to a numeric value quo(a)==2 returns FALSE so you expression is evaluating to arrange(-FALSE, g2) which would give you the same error message.

    my_arrange <- function(df, arrange_var) {
      quo_arrange_var <- enquo(arrange_var)
    
      df %>%
        arrange(-((!!quo_arrange_var)==2), !!quo_arrange_var)
    }
    my_arrange(df, g2)
    # # A tibble: 5 x 4
    #      g1    g2     a     b
    #   <dbl> <dbl> <int> <int>
    # 1     1     2     5     4
    # 2     1     1     2     5
    # 3     2     1     4     3
    # 4     2     1     3     1
    # 5     2     3     1     2