This feels like a really simple operation - compute the mean by group from one dataframe and merge it to another pre-formatted dataframe - my UDF does this and that is not really the part I'm struggling with.
What I want is for my function to iterate through a series(list, vector, etc.) of arguments.
I want to quickly be able to build a list (or vector - I'm not set on using lists) of variables and pass it to the function as the argument so it builds a dataframe with all the variables I feed it in that list. My real database has 50+ variables and I want to make all different kinds of new dataframes with different combinations of the variables. One list might have 5 variables, the other might have 25. But I'm open to the idea that I have something conceptually wrong and that I should be using a loop, purrr, map, apply, some other package etc. or change the way my function is written? What am I missing?
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
#Here I set up the dataframe which the function will bind to
data_sample_averages <- data_sample %>%
group_by(Name) %>%
dplyr::summarise(Defense_Grade_Average = mean(Defense_Grade))
#> `summarise()` ungrouping output (override with `.groups` argument)
#Function which computes average of variable (the only argument) and merges it back to data_sample_averages
get_avg2 <- function(v_name) {
avg <- "_Average"
data_1 <- data_sample %>%
dplyr::group_by(Name) %>%
dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE))
data_sample_averages <- merge(data_sample_averages, data_1, by = "Name")
return(data_sample_averages)
}
#This works - it computers the average of Tackle_Grade and binds it to data_sample_averages
#However my real dataframe has 50+ columns and I don't want to copy and paste this line 50 times, changing the argument every time.
data_sample_averages <- get_avg2(Tackle_Grade)
#> `summarise()` ungrouping output (override with `.groups` argument)
#shows you the averages
print(data_sample_averages)
#> Name Defense_Grade_Average Tackle_Grade__Average
#> 1 Andre Walker 95.33333 76
#> 2 Dalton Campbell 88.66667 69
#Neither of these work - this is where I'm stuck
#I want my function to iterate through a list of arguments which are essentially just character #strings in order for the UDF to work
variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade")
data_sample_averages <- lapply(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
data_sample_averages <- purrr::map(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
You said you were not set on using lists, so I'm using vectors.
My solution relies on a function in the latest version of dplyr
: the across()
function.
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
# The function
compute_avg <- function(.data, names){
names_quo <- enquos(names)
.data %>%
group_by(Name) %>%
summarise(
across(
.cols = !!!names_quo,
.fns = ~ mean(.x, na.rm = TRUE),
.names = "{.col}_Average"
)
)
}
compute_avg(.data = data_sample, names = c(Defense_Grade, Tackle_Grade))
# A tibble: 2 x 3
Name Defense_Grade_Average Tackle_Grade_Average
<chr> <dbl> <dbl>
1 Andre Walker 95.3 76
2 Dalton Campbell 88.7 69