Search code examples
rtidymodels

In `tidymodels` how do I do an F test to compare two models?


In base R it is easy to compare two models with the anova() function and get an F test.

library(MASS)
lm.fit1 <- lm(medv ~ . , data = Boston)
lm.fit1a <- update(lm.fit1, ~ . - age - black)

anova(lm.fit1a, lm.fit1)

If I am working with tidymodels workflows. How do I do the same comparison? I have code like this:

library(tidymodels)
lm_spec <- linear_reg() %>%
  set_mode("regression") %>%
  set_engine("lm")

the_rec <- recipe(medv ~ ., data = Boston)

the_workflow <- workflow() %>% 
  add_recipe(the_rec) %>% 
  add_model(lm_spec)

the_workflow_fit1 <- 
  fit(the_workflow, data = Boston)
tidy(the_workflow_fit1)


the_workflow_fit1a <- 
  the_workflow_fit1  %>% 
  update_recipe(the_rec %>% step_rm(age, black)) %>% 
  fit(data = Boston) 
tidy(the_workflow_fit1a)

I don't know how to extract the right object (thingy) to feed a statement like this:

anova(the_workflow_fit1a$thingy, the_workflow_fit1$thingy)

What is the thingy I need? Is there an elegant way to do this inside of the tidymodels ecosystem?


Solution

  • Many hours later and a post from @juliasilge https://github.com/tidymodels/workflows/issues/54 which introduced me to pull_workflow_fit() I have a tidymodels solution.

    The base R code:

    library(MASS)
    lm.fit1 <- lm(medv ~ . , data = Boston)
    lm.fit1a <- update(lm.fit1, ~ . - age - black)
    anova(lm.fit1a, lm.fit1)
    

    Can be done in tidymodels with:

    library(tidymodels)
    lm_spec <- linear_reg() %>%
      set_mode("regression") %>%
      set_engine("lm")
    
    the_rec <- recipe(medv ~ ., data = Boston)
    
    the_workflow <- workflow() %>% 
      add_recipe(the_rec) %>% 
      add_model(lm_spec)
    
    the_workflow_fit1 <- 
      fit(the_workflow, data = Boston) %>% 
      extract_fit_parsnip()
    
    the_workflow_fit1a <- 
      the_workflow  %>% 
      update_recipe(
        the_rec %>% step_rm(age, black)
      ) %>% 
      fit(data = Boston) %>% 
      extract_fit_parsnip()
    
    anova(the_workflow_fit1a$fit, the_workflow_fit1$fit)