Problem
By using dplyr::summarize_at()
(or equivalent), I would like to get a table of summaries in which columns are sorted first by (G) order of grouping variables used, then by (V) order of variables passed and lastly by (F) order of functions applied. The default order is determined first by G, then by F and lastly by V.
Example
The code:
library(purrr)
library(dplyr)
q025 <- partial(quantile, probs = 0.025, na.rm = TRUE)
q975 <- partial(quantile, probs = 0.975, na.rm = TRUE)
vars_to_summarize <- c("height", "mass")
my_summary <- starwars %>%
filter(skin_color %in% c("gold", "green")) %>%
group_by(skin_color) %>%
summarise_at(vars_to_summarize, funs(q025, mean, q975))
Results in:
my_summary
## A tibble: 2 x 7
## skin_color height_q025 mass_q025 height_mean mass_mean height_q975 mass_q975
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 gold 167.000 75.0 167 75 167.00 75.0
## 2 green 79.375 22.7 169 NA 204.75 110.4
And the desired order of variables should be:
skin_color, height_q025, height_mean, height_q975, mass_q025, mass_mean, mass_q975
I would like to use something like this (naively simple) code:
my_summary %>%
select(everything(), starts_with(vars_to_summarize))
But it does not work. Even this code does not work as I expect (even though it's not a general solution I seek):
my_summary %>%
select(everything(),
starts_with(vars_to_summarize[1]),
starts_with(vars_to_summarize[2]))
Most probably everything()
should always be the last argument in select()
.
To generalize
Say, I have:
group_by()
,In general, the desired order of variables in the summary table should follow the pattern:
gr_1, gr_2, ..., gr_N,
var_1_fun_1, var_1_fun_2, ..., var_1_fun_M,
var_2_fun_1, var_2_fun_2, ..., var_2_fun_M,
...,
var_L_fun_1, var_L_fun_2, ..., var_L_fun_M
We can use matches
and grep
my_summary %>%
select(grep(paste(vars_to_summarize, collapse="|"), names(.), invert = TRUE),
matches(vars_to_summarize[1]),
matches(vars_to_summarize[2]))
# A tibble: 2 x 7
# skin_color height_q025 height_mean height_q975 mass_q025 mass_mean mass_q975
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 gold 167.000 167 167.00 75.0 75 75.0
#2 green 79.375 169 204.75 22.7 NA 110.4
If there are many columns, then another option would be to remove the substring from _
in the column names, match
with the 'vars_to_summarize' and order
within the select
my_summary %>%
select(order(match(sub("_.*", "", names(.)), vars_to_summarize, nomatch = 0)))
# A tibble: 2 x 7
# skin_color height_q025 height_mean height_q975 mass_q025 mass_mean mass_q975
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 gold 167.000 167 167.00 75.0 75 75.0
#2 green 79.375 169 204.75 22.7 NA 110.4