Search code examples
rfunctiondplyrmagrittr

R arguments not being passed to pipe within custom function


I regularly have to perform a piped series of operations that groups by one or more (usually two) variables, finds the mean and confidence interval of one or more variables, and outputs the results to a summary table for plotting or reporting.

Usually I do this by copying and pasting a script e.g.:

aggdata <- data %>% group_by(Time, Category) %>%
    summarise(mean.Volume = mean(Volume, na.rm = TRUE),
              sd.Volume = sd(Volume, na.rm = TRUE),
              n.Volume = n(),
              Volume = sum(Volume))%>%
    mutate(se.Volume = sd.Volume / sqrt(n.Volume),
           lower.ci.Volume = mean.Volume - qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume,
           upper.ci.Volume = mean.Volume + qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume)

So I tried writing a function for this, however for both of the following:

aggvols1 <- function(data, a, b, values) {
   data %>% group_by(a, b) %>%
    summarise(mean.Volume = mean(values, na.rm = TRUE),
              sd.Volume = sd(values, na.rm = TRUE),
              n.Volume = n(),
              Volume = sum(values))%>%
    mutate(se.Volume = sd.Volume / sqrt(n.Volume),
           lower.ci.Volume = mean.Volume - qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume,
           upper.ci.Volume = mean.Volume + qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume)
}

and

aggvols2 <- function(data, a, b, values) {
  groupvars <-c(data$a,data$b) #also does not work if just use c(a,b)
  data %>% group_by(groupvars) %>%
    summarise(mean.Volume = mean(values, na.rm = TRUE),
              sd.Volume = sd(values, na.rm = TRUE),
              n.Volume = n(),
              Volume = sum(values))%>%
    mutate(se.Volume = sd.Volume / sqrt(n.Volume),
           lower.ci.Volume = mean.Volume - qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume,
           upper.ci.Volume = mean.Volume + qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume)
}

followed by e.g.

test <- aggvols1(data=salesdata, a=Participation, b=Time_Period, values=volumes_sold)

returns the same error message:

Error in aggvols1(data=salesdata, a=Participation, b=Time_Period, values=volumes_sold) : 
  unused arguments (a = Participation, b = Time_Period)

How can I make the arguments a and b get passed as the grouping variables so that the function returns a table of grouped means and CIs?

Ultimately my goal is not just to get this running but alter it so that instead of specifying two grouping variable columns and a single value column, I can specify a vector of grouping variables and a vector of values variables so that it can group by and calculate responses for one or multiple columns, adding the column name of each input "values" variable as a suffix to each output column for differentiation.

Any advice on how to fix the function so it runs and/or how to improve the function as described above would be greatly appreciated; I'm new to writing my own functions but am trying to move towards using them instead of just copying and pasting code where possible.


Solution

  • I also would like to adivse you to use rlang syntax but do have a little different approach. You have to use quotations to get dplyr to accept varnames the way you want to provide them inside a function. The following code is working for me. Also have a look at vignette("programming", "dplyr") and the RStudio Cheat Sheet for rlang here https://rstudio.com/resources/cheatsheets/.

    aggvols1 <- function(data, a, b, values) {
    
      a <- enquo(a)
      b <- enquo(b)
      values <- enquo(values)
    
      data %>% group_by(!! a, !! b) %>%
        summarise(mean.Volume = mean(!! values, na.rm = TRUE),
                  sd.Volume = sd(!! values, na.rm = TRUE),
                  n.Volume = n(),
                  Volume = sum(!! values))%>%
        mutate(se.Volume = sd.Volume / sqrt(n.Volume),
               lower.ci.Volume = mean.Volume - qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume,
               upper.ci.Volume = mean.Volume + qt(1 - (0.05 / 2), n.Volume - 1) * se.Volume)
    }