I'm trying to simplify a current piece of code in my script.
I want to group by each possible combination of two categorical variables and summarise a mean value of my explanatory variable.
Example using mpg database found in ggplot2;
library(tidyverse)
mpg %>% group_by(manufacturer, model) %>% summarise(mean = mean(hwy))
mpg %>% group_by(manufacturer, year) %>% summarise(mean = mean(hwy))
mpg %>% group_by(manufacturer, cyl) %>% summarise(mean = mean(hwy))
(this would continue until all combination of categorical variables - columns is done)
mpg %>% group_by(cyl, year) %>% summarise(mean = mean(hwy))
etc...
My actual database has hundreds of categorical variables so I would like to iterate the process in a for loop or using purrr for example.
Thanks
This uses purrr
to select character and factor columns and then combn()
to select all of the combinations.
library(ggplot2)
library(purrr)
library(dplyr)
map_lgl(mpg, ~ is.character(.) | is.factor(.))%>%
names(.)[.]%>%
combn(2, function(x) {mpg%>%group_by_at(x)%>%summarize(mean = mean(hwy))}, simplify = F)
Note, this can become messy as choose(100,2)
evaluates to 4,950 combinations.