Search code examples
rtidymodelsr-parsnip

How can we extract covariance matrix from a parsnip object?


I am trying to use tidymodels ecosystem to perform econometric analysis. The example I am following at the moment is from the book “Principles of Econometrics with R” by Colonescu. The data from the book can be downloaded through

devtools::install_github("ccolonescue/PoEData")

0.1 The Example

I am creating a wage discrimination model, which has interaction effects as well. The model is as follows

library(tidymodels)
library(PoEdata)#to load the data
library(car)#For linearHypothesis function

Loading required package: carData

lm_model <- linear_reg() %>% 
     set_engine("lm")#model specification
data("cps4_small")
mod1 <- lm_model %>% 
     fit(wage~educ+black*female, data=cps4_small)#model fitting

0.2 The Issue

After creating the model, I want to test the hypothesis that there is no discrimination on the basis of gender or race. In other words, I need to test the hypothesis that the coefficients of black, female, and black:female are all zero at the same type. I want to use linearHypothesis function from the car package for this.

hyp <- c("black=0", "female=0", "black:female=0")
tab <- tidy(linearHypothesis(mod1, hyp))

This gives me an error that there is no applicable method for vcov for an object of class _lm or model_fit.

So, can someone help me how I can generate covariance matrix from a parsnip object?


Solution

  • You need to use the extract_fit_engine() to get out the underlying lm fit object from the parsnip model object.

    library(tidymodels)
    library(PoEdata)
    library(car)
    
    data("cps4_small")
    
    lm_model <- linear_reg() %>% 
      set_engine("lm")
    
    mod1 <- lm_model %>% 
      fit(wage ~ educ + black * female, data = cps4_small)
    
    hyp <- c("black=0", "female=0", "black:female=0")
    
    mod1 %>%
      extract_fit_engine() %>%
      linearHypothesis(hyp) %>%
      tidy()
    #> # A tibble: 2 × 6
    #>   res.df     rss    df sumsq statistic  p.value
    #>    <dbl>   <dbl> <dbl> <dbl>     <dbl>    <dbl>
    #> 1    998 135771.    NA   NA       NA   NA      
    #> 2    995 130195.     3 5576.      14.2  4.53e-9
    

    Created on 2021-11-13 by the reprex package (v2.0.1)