Search code examples
tidymodelsbroom

tidy() function cant process last_fit() obejcts


Functions like last_fit() from the tune package produces last_fit objects which are large nested lists containing the fit results. I tried to transform them into data.frames using the tidy() function from the broom package but this resulted in the following error:

MRE :

library(tidymodels)
library(tidyverse)

data <- mtcars

model_default<-
parsnip::boost_tree(
    mode = "regression"
) %>%
set_engine('xgboost',objective = 'reg:squarederror')

wf <- workflow() %>%
add_model(model_default) %>%
add_recipe(recipe(mpg~.,data))

lf <- last_fit(wf,split)
tidy_lf <- tidy(lf)

Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : 
is.atomic(x) is not TRUE
In addition: Warning messages:
1: Data frame tidiers are deprecated and will be removed in an upcoming release of broom. 
2: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
4: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
5: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
6: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
7: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA

   
    
    

Question : How can I use tidy() with an last_fit() output?


Solution

  • The object that last_fit() creates is a tibble (containing metrics, predictions, etc), not a model that can be tidied. You can use extract_workflow() to extract out the fitted workflow from the object created by last_fit(), and this object can be tidied:

    library(tidymodels)
    
    car_split <- initial_split(mtcars)
    
    wf <- workflow() %>%
        add_model(linear_reg()) %>%
        add_recipe(recipe(mpg ~ ., mtcars))
    
    lf <- last_fit(wf, car_split)
    lf
    #> # Resampling results
    #> # Manual resampling 
    #> # A tibble: 1 × 6
    #>   splits         id               .metrics .notes   .predictions     .workflow 
    #>   <list>         <chr>            <list>   <list>   <list>           <list>    
    #> 1 <split [24/8]> train/test split <tibble> <tibble> <tibble [8 × 4]> <workflow>
    
    lf %>%
        extract_workflow() %>%
        tidy()
    #> # A tibble: 11 × 5
    #>    term         estimate std.error statistic p.value
    #>    <chr>           <dbl>     <dbl>     <dbl>   <dbl>
    #>  1 (Intercept) -33.6       36.0      -0.935   0.367 
    #>  2 cyl          -0.0296     1.34     -0.0221  0.983 
    #>  3 disp          0.0252     0.0269    0.934   0.367 
    #>  4 hp           -0.00539    0.0319   -0.169   0.868 
    #>  5 drat         -0.167      2.54     -0.0659  0.948 
    #>  6 wt           -5.69       2.79     -2.04    0.0623
    #>  7 qsec          3.32       1.76      1.89    0.0820
    #>  8 vs           -4.40       3.80     -1.16    0.268 
    #>  9 am            2.54       2.67      0.950   0.360 
    #> 10 gear          2.69       2.28      1.18    0.259 
    #> 11 carb         -0.0486     1.11     -0.0439  0.966
    

    Created on 2022-03-23 by the reprex package (v2.0.1)