Search code examples
rfunctiondataframedplyrsapply

What is the best way to loop through function arguments?


This feels like a really simple operation - compute the mean by group from one dataframe and merge it to another pre-formatted dataframe - my UDF does this and that is not really the part I'm struggling with.

What I want is for my function to iterate through a series(list, vector, etc.) of arguments.

I want to quickly be able to build a list (or vector - I'm not set on using lists) of variables and pass it to the function as the argument so it builds a dataframe with all the variables I feed it in that list. My real database has 50+ variables and I want to make all different kinds of new dataframes with different combinations of the variables. One list might have 5 variables, the other might have 25. But I'm open to the idea that I have something conceptually wrong and that I should be using a loop, purrr, map, apply, some other package etc. or change the way my function is written? What am I missing?

library(tidyverse)

data_sample <- data.frame(
  Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
  Defense_Grade = c(88, 86, 92, 94, 97, 95),
  Tackle_Grade = c(66, 69, 72, 74, 76, 78),
  Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)

#Here I set up the dataframe which the function will bind to 
data_sample_averages <-  data_sample %>% 
  group_by(Name) %>% 
  dplyr::summarise(Defense_Grade_Average = mean(Defense_Grade))
#> `summarise()` ungrouping output (override with `.groups` argument)


#Function which computes average of variable (the only argument) and merges it back to data_sample_averages
get_avg2 <- function(v_name) {
  
  avg <- "_Average"      
  
  data_1 <-  data_sample %>% 
    dplyr::group_by(Name) %>% 
    dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE))
  
  data_sample_averages <- merge(data_sample_averages, data_1, by = "Name")
  
  return(data_sample_averages)

}

#This works - it computers the average of Tackle_Grade and binds it to data_sample_averages
#However my real dataframe has 50+ columns and I don't want to copy and paste this line 50 times, changing the argument every time.
data_sample_averages <- get_avg2(Tackle_Grade)
#> `summarise()` ungrouping output (override with `.groups` argument)

#shows you the averages
print(data_sample_averages)
#>              Name Defense_Grade_Average Tackle_Grade__Average
#> 1    Andre Walker              95.33333                    76
#> 2 Dalton Campbell              88.66667                    69


#Neither of these work - this is where I'm stuck
#I want my function to iterate through a list of arguments which are essentially just character #strings in order for the UDF to work 
variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade")

data_sample_averages <- lapply(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA

#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)

data_sample_averages <- purrr::map(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)


Solution

  • You said you were not set on using lists, so I'm using vectors.

    My solution relies on a function in the latest version of dplyr: the across() function.

    library(tidyverse)
    
    data_sample <- data.frame(
      Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
      Defense_Grade = c(88, 86, 92, 94, 97, 95),
      Tackle_Grade = c(66, 69, 72, 74, 76, 78),
      Coverage_Grade = c(44, 43, 44, 76, 73, 78)
    )
    
    # The function
    compute_avg <- function(.data, names){
        
        names_quo <- enquos(names)  
      
        .data %>%
        group_by(Name) %>%
        summarise(
          across(
            .cols = !!!names_quo,
            .fns = ~ mean(.x, na.rm = TRUE),
            .names = "{.col}_Average"
          )
        )
    }
    
    compute_avg(.data = data_sample, names = c(Defense_Grade, Tackle_Grade))
    
    # A tibble: 2 x 3
      Name            Defense_Grade_Average Tackle_Grade_Average
      <chr>                           <dbl>                <dbl>
    1 Andre Walker                     95.3                   76
    2 Dalton Campbell                  88.7                   69