I would like to specify "sum to zero" contrasts for two predictors in a LM using a tidymodels
recipe. Is it possible? In looking at the recipes
documentation, before 1.3, it seems there were attempts to build the variable specific options but the strategy was shifted to a global option.
I am trying to convert this base R code into tidymodels
:
Bikeshare <- ISLR2::Bikeshare # start with original data
contrasts(Bikeshare$hr) <- contr.sum(24)
contrasts(Bikeshare$mnth) <- contr.sum(12)
mod.lm2 <-
lm(
bikers ~ mnth + hr + workingday + temp + weathersit,
data = Bikeshare
)
summary(mod.lm2)
I got this far:
library(tidymodels)
Bikeshare <- ISLR2::Bikeshare # start with original data
contrasts(Bikeshare$hr) <- contr.sum(24)
contrasts(Bikeshare$mnth) <- contr.sum(12)
lm_spec <- linear_reg() %>%
set_engine("lm")
the_rec <-
recipe(
bikers ~ mnth + hr + workingday + temp + weathersit,
data = Bikeshare
) %>%
step_dummy(c(mnth, hr), one_hot = TRUE)
the_workflow<- workflow() %>%
add_recipe(the_rec) %>%
add_model(lm_spec)
the_workflow_fit_lm_fit <-
fit(the_workflow, data = Bikeshare) %>%
extract_fit_parsnip()
summary(the_workflow_fit_lm_fit$fit)
Does anybody know how to get the same results out of a tidymodels
workflow?
I don't think I can use contr.sum as a global option. This gives me the betas I would like for two of the variables but it changes the contrasts on others.
BikeShare <- ISLR2::Bikeshare # be sure to work with original data ;
old_opt <- options()$contrast;
options(contrasts = c('contr.sum', 'contr.poly'))
The docs for step_dummy()
have :
To change the type of contrast being used, change the global contrast option via
options
.
so there is no way, outside of global options, to change it.
We should probably have an example though :-/
Note that, for new samples, the options are read from the global option again. Make sure that they are set the same at prediction-time:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
tidymodels_prefer()
data("penguins")
penguins <-
penguins %>%
distinct(species)
# R's defaults
old_opt <- options()$contrast
old_opt
#> unordered ordered
#> "contr.treatment" "contr.poly"
# default contrast
default <-
recipe(~ species, data = penguins) %>%
step_dummy(species) %>%
prep()
default %>% bake(new_data = NULL)
#> # A tibble: 3 × 2
#> species_Chinstrap species_Gentoo
#> <dbl> <dbl>
#> 1 0 0
#> 2 0 1
#> 3 1 0
# Do do something different
# Now set to something else:
options(contrasts = c('contr.sum', 'contr.poly'))
with_opt <-
recipe(~ species, data = penguins) %>%
step_dummy(species) %>%
prep()
with_opt %>% bake(new_data = NULL)
#> # A tibble: 3 × 2
#> species_X1 species_X2
#> <dbl> <dbl>
#> 1 1 0
#> 2 -1 -1
#> 3 0 1
# reset options:
options(contrasts = old_opt)
with_opt %>% bake(new_data = penguins)
#> # A tibble: 3 × 2
#> species_Chinstrap species_Gentoo
#> <dbl> <dbl>
#> 1 0 0
#> 2 0 1
#> 3 1 0
Created on 2021-11-16 by the reprex package (v2.0.0)
edit for clarity