Search code examples
rdplyrrlang

how to use bracket notation (or an alternative) while programming with dplyr


I'm trying to write a function to calculate toplines (as commonly used in polling data). It needs to include both a "percent" and "valid percent" column.

Here's an example

library(tidyverse)
# prepare some data
d <- gss_cat %>%
  mutate(tvhours2 = tvhours,
         tvhours2 = replace(tvhours2, tvhours > 5 , "6-8"),
         tvhours2 = replace(tvhours2, tvhours > 8 , "9+"),
         tvhours2 = fct_explicit_na(tvhours2),
         # make a weight variable
         fakeweight = rnorm(n(), mean = 1, sd = .25))

The following function works as far as it goes:

make.topline <- function(variable, data, weight){
  variable <- enquo(variable)
  weight <- enquo(weight)

  table <- data %>%
    # calculate denominator
    mutate(total = sum(!!weight)) %>%
    # calculate proportions
    group_by(!!variable) %>%
    summarise(pct = (sum(!!weight)/first(total))*100,
              n = sum(!!weight))

  table
}
make.topline(variable = tvhours2, data = d, weight = fakeweight)

image of topline table without valid percent field

I'm struggling to implement the valid percent column. Here is the syntax I tried.

make.topline2 <- function(variable, data, weight){
  variable <- enquo(variable)
  weight <- enquo(weight)

  table <- data %>%
    # calculate denominator
    mutate(total = sum(!!weight),
           valid.total = sum(!!weight[!!variable != "(Missing)"])) %>%
    # calculate proportions
    group_by(!!variable) %>%
    summarise(pct = (sum(!!weight)/first(total))*100,
              valid.pct = (sum(!!weight)/first(valid.total))*100,
              n = sum(!!weight))

  table
}

make.topline2(variable = tvhours2, data = d, weight = fakeweight)

This yields the following error:

 Error: Base operators are not defined for quosures.
Do you need to unquote the quosure?

  # Bad:
  myquosure != rhs

  # Good:
  !!myquosure != rhs
Call `rlang::last_error()` to see a backtrace 

I know the problem is in this line, but I don't know how to fix it:

mutate(valid.total = sum(!!weight[!!variable != "(Missing)"]))

Solution

  • You can put parentheses around the !!weight. I think of this as making sure we are using the extract brackets only after weight is unquoted (so an order of operations thing).

    That line would then look like:

    valid.total = sum((!!weight)[!!variable != "(Missing)"])

    Alternatively, you could use the new curly-curly operator ({{), which takes the place of the enquo()/!! combination for relatively simple cases like yours. Then your function would look something like

    make.topline <- function(variable, data, weight){
    
        table <- data %>%
            # calculate denominator
            mutate(total = sum({{ weight }}),
                   valid.total = sum({{ weight }}[{{ variable }} != "(Missing)"])) %>%
            # calculate proportions
            group_by({{ variable }}) %>%
            summarise(pct = (sum({{ weight }})/first(total))*100,
                      valid.pct = (sum({{ weight }})/first(valid.total))*100,
                      n = sum({{ weight }}))
    
        table
    }
    

    Like the parentheses solution, this runs without error.

    make.topline(variable = tvhours2, data = d, weight = fakeweight)
    
    # A tibble: 9 x 4
      tvhours2    pct valid.pct      n
      <fct>     <dbl>     <dbl>  <dbl>
    1 0          3.16      5.98   679.
    2 1         10.9      20.6   2342.
    3 2         14.1      26.6   3022.
    4 3          9.10     17.2   1957.
    5 4          6.67     12.6   1432.
    6 5          3.24      6.13   696.
    7 6-8        4.02      7.61   864.
    8 9+         1.67      3.16   358.
    9 (Missing) 47.2      89.3  10140.