Calculating predicted values for all categories using tidymodels

This question relates to this question here.

I am running a similar model as in that question, but at the last line I would like to have 7 predicted columns (i.e. change the dataset in a way that in the first case the new dataset group=0, in the second group=1, etc.

# Code from the original question
library(dplyr)

year <- rep(2014:2015, length.out=10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
value <- sample(10000, replace=T)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- data.frame(year=year, group=group, value=value, female=female, smoker=smoker)

# cut the dataset into list
table_list <- dta %>%
  group_by(year, group) %>%
  group_split()

# fit model per subgroup
model_list <- lapply(table_list, function(x) glm(smoker ~ female*group, data=x,
                                                 family=binomial(link="probit")))

# create new dataset where group =1
dat_new0 <- data.frame(dta[, c("smoker", "year", female)], group=0) 
dat_new1 <- data.frame(dta[, c("smoker", "year", female)], group=1) 
dat_new2 <- data.frame(dta[, c("smoker", "year", female)], group=2)

etc.
 

pred0 <- predict.glm(dat_new0, type = "response")
pred1 <- predict.glm(dat_new1, type = "response")
pred2 <- predict.glm(dat_new2, type = "response")

etc.

Instead of doing this by hand, I would like to use tidymodels somehow.

Solution

I think I would use broom for this. First, use nest() to split your data into the groupings you want to use for modeling and then map() over them to train your models:

library(tidyverse)
library(broom)

year <- rep(2014:2015, length.out=10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- tibble(year = year, group = group, female = female, smoker = smoker)

mods <- dta %>%
    nest(data = c(-year)) %>%
    mutate(model = map(data, ~ glm(smoker ~ female*group, data = .,
                                 family = binomial(link = "probit"))))

mods
#> # A tibble: 2 × 3
#>    year data                 model 
#>   <int> <list>               <list>
#> 1  2014 <tibble [5,000 × 3]> <glm> 
#> 2  2015 <tibble [5,000 × 3]> <glm>

Now use crossing() from tidyr to create your new example data:

new_dat <- crossing(smoker = 0:1, female = 0:1, year = 2014:2015, group = 0:2)
new_dat
#> # A tibble: 24 × 4
#>    smoker female  year group
#>     <int>  <int> <int> <int>
#>  1      0      0  2014     0
#>  2      0      0  2014     1
#>  3      0      0  2014     2
#>  4      0      0  2015     0
#>  5      0      0  2015     1
#>  6      0      0  2015     2
#>  7      0      1  2014     0
#>  8      0      1  2014     1
#>  9      0      1  2014     2
#> 10      0      1  2015     0
#> # … with 14 more rows

Then predict for this new example data on each of your trained models. (I used augment() here from broom so that the new predicted column gets added on to the existing columns, but you could also use predict()).

mods %>%
    mutate(preds = map(model, augment, newdata = new_dat))
#> # A tibble: 2 × 4
#>    year data                 model  preds            
#>   <int> <list>               <list> <list>           
#> 1  2014 <tibble [5,000 × 3]> <glm>  <tibble [24 × 5]>
#> 2  2015 <tibble [5,000 × 3]> <glm>  <tibble [24 × 5]>

^{Created on 2021-11-15 by the reprex package (v2.0.1)}

Once you have these predictions, you can unnest() them and then handle them however you like.