I have the following data (some organizations, kpi measurements, long list of variables (i gave two in the example).
df <- tibble::tribble(
~ORG_NM, ~KPI_NM, ~NUMR_VAL, ~DENO_VAL,
"AAA", "xxx", 8, 10,
"AAA", "xxx", 10, 10,
"BBB", "xxx", 1, 7,
"CCC", "xxx", 9, 3,
"CCC", "yyy", 9, 4,
"DDD", "xxx", 1, 7,
"AAA", "yyy", 8, 3,
"BBB", "yyy", 6, 1
)
I would like to summarize each of the variables and generate a wide table so that each organization has only one record. So far my approach was to use repetitive code in which I need to change, - summarise part with a name of a variable to be summarized and a function this variable should be processed with, - spread part - value = change the name of the new column, - rename_at - add meaningful suffix to all spread columns with values, showing what function was used to achieve them. At the end I need to change the name of the dataframe in full_join to append the new columns.
library(tidyverse)
df_numrtr <- df %>%
group_by(ORG_NM, KPI_NM) %>%
summarise(mean_NUM_VAL = mean(NUMR_VAL)) %>%
spread(key = c(KPI_NM), mean_NUM_VAL) %>%
ungroup() %>%
rename_at(vars(-ORG_NM), function(x) paste0(x, "_num_mean"))
df_denom <- df %>%
group_by(ORG_NM, KPI_NM) %>%
summarise(mean_DENOM_VAL = mean(DENO_VAL)) %>%
spread(key = c(KPI_NM), mean_DENOM_VAL) %>%
ungroup() %>%
rename_at(vars(-ORG_NM), function(x)
paste0(x, "_den_mean"))
df_final <-
df_numrtr %>%
full_join(df_denom) %>%
select(ORG_NM, sort(names(.)))
ORG_NM xxx_den_mean xxx_num_mean yyy_den_mean yyy_num_mean
<chr> <dbl> <dbl> <dbl> <dbl>
1 AAA 10 9 3 8
2 BBB 7 1 1 6
3 CCC 3 9 4 9
4 DDD 7 1 NA NA
i would like to get rid of repetitive code and have a function that will take in a name of a variable and a function. My desired function in pseudocode would look like
fnSummarize <- function(df, my_org_var, my_kpi_var, my_var, my_fun ){
df_output<-df %>%
group_by({{my_groupby_var}}) %>%
summarise(paste0({{my_var}},"_",{{my_fun}}) = my_fun({{my_var}})) %>%
spread(key = {{my_kpi_var}}, paste0(my_var, my_fun)) %>%
ungroup() %>%
rename_at(vars(-{{ my_org_var}}), function(x) paste0(x, {{myfun}}))
return(df_output)
}
How to properly inject column names and a function to be used in the process (like mean, sum, median, sd) to such a function.
You are quite close. The problem is composition of the column name, which I pulled out into a separate line:
fnSummarize <- function(df, my_org_var, my_kpi_var, my_var, my_fun ){
colName <- str_c( rlang::enexpr(my_var),"_",rlang::enexpr(my_fun) )
df %>%
group_by( {{my_org_var}}, {{my_kpi_var}} ) %>%
summarise( !!colName := {{my_fun}}({{my_var}}) ) %>%
spread( key = {{my_kpi_var}}, colName ) %>%
ungroup() %>%
rename_at( vars(-{{my_org_var}}), str_c, "_", colName )
}
The column names are slightly different than your df_numrtr
and df_denom
, but this can be easily adjusted through additional string manipulation. I left it out to keep things clean.
fnSummarize( df, ORG_NM, KPI_NM, NUMR_VAL, mean )
# # A tibble: 4 x 3
# ORG_NM xxx_NUMR_VAL_mean yyy_NUMR_VAL_mean
# <chr> <dbl> <dbl>
# 1 AAA 9 8
# 2 BBB 1 6
# 3 CCC 9 9
# 4 DDD 1 NA
## Demonstrating using sum instead of mean
fnSummarize( df, ORG_NM, KPI_NM, DENO_VAL, sum )
# # A tibble: 4 x 3
# ORG_NM xxx_DENO_VAL_sum yyy_DENO_VAL_sum
# <chr> <dbl> <dbl>
# 1 AAA 20 3
# 2 BBB 7 1
# 3 CCC 3 4
# 4 DDD 7 NA
I also want to point out that you can address your task through pure dplyr
manipulations, without the need for rlang
. For example, here's how you would apply both sum
and mean
:
df %>% group_by( ORG_NM, KPI_NM ) %>%
summarize_at( c("NUMR_VAL", "DENO_VAL"), list(mean=mean,sum=sum) ) %>%
ungroup() %>% gather( "Variable", "Value", -ORG_NM, -KPI_NM ) %>%
mutate( Variable = map2_chr(Variable, KPI_NM, ~str_replace(.x,"VAL",.y)) ) %>%
select( -KPI_NM ) %>% spread( Variable, Value )
# # A tibble: 4 x 9
# ORG_NM DENO_xxx_mean DENO_xxx_sum DENO_yyy_mean DENO_yyy_sum NUMR_xxx_mean
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AAA 10 20 3 3 9
# 2 BBB 7 7 1 1 1
# 3 CCC 3 3 4 4 9
# 4 DDD 7 7 NA NA 1
# # … with 3 more variables: NUMR_xxx_sum <dbl>, NUMR_yyy_mean <dbl>,
# # NUMR_yyy_sum <dbl>