Search code examples
rtidymodels

Workflow Tidymodels Formula Object


I am fairly new using R and I am following a guide and learning to build an Expected Goals Model for my hockey league. When I run the code below, I get the error at the bottom. Is there something simple that I am missing?

Seems like its trying to use a formula in the model portion of the workflow but I already have a recipe in there. Thanks in advance for any help anyone can offer me! The guide is here https://www.thesignificantgame.com/portfolio/expected-goals-model-with-tidymodels/

library(tidymodels)
library(tidyverse)
library(dplyr)

set.seed(1972)
train_test_split <- initial_split(data = EXPECTED_GOALS_MODEL, prop = 0.80)
train_data <- train_test_split %>% training() 
test_data  <- train_test_split %>% testing()
    
xg_recipe <- recipe(Goal ~ DistanceC + Angle + Home + Hand + AgeDec31 + GoalieAgeDec31 + NewX + NewY, data = train_data) %>% update_role(NewX, NewY, new_role = "ID")
    
model <- logistic_reg() %>% set_engine("glm")
    
xg_wflow <- workflow() %>% add_model(model) %>% add_recipe(xg_recipe)

xg_wflow
    
xg_fit <- xg_wflow %>% fit(data = train_data)

Error in validObject(.Object) : 
  invalid class “model” object: invalid object for slot "formula" in class "model": got class "workflow", should be or extend class "formula"
In addition: Warning message:
In fit(., data = train_data) :
  fit failed: Error in as.matrix(y) : argument "y" is missing, with no default
 fit(x = ., data = train_data) 

Solution

  • It's difficult to tell exactly what the issue is without a reproducible example, though this error brings up a few questions up for me:

    • Does the EXPECTED_GOALS_MODEL data indeed have a column called Goal in it, with two unique levels? Have you also spelled the remainder of the column names correctly?
    • Are your tidymodels package installs up to date?
    • Does this error persist if you run specifically generics::fit(data = train_data) instead of fit(data = train_data)? This almost looks like a different fit() is being dispatched to.

    Here's a place to start with a reprex:

    library(tidymodels)
    data(ames)
    
    set.seed(1972)
    ames <- ames %>% rowid_to_column()
    train_test_split <- initial_split(data = ames, prop = 0.80)
    train_data <- train_test_split %>% training() 
    test_data  <- train_test_split %>% testing()
    
    xg_recipe <- recipe(Sale_Price ~ ., data = train_data) %>% update_role(rowid, new_role = "ID")
    
    model <- linear_reg() %>% set_engine("glm")
    
    xg_wflow <- workflow() %>% add_model(model) %>% add_recipe(xg_recipe)
    
    xg_fit <- xg_wflow %>% fit(data = train_data)
    
    xg_fit
    #> ══ Workflow [trained] ══════════════════════════════════════════════════════════
    #> Preprocessor: Recipe
    #> Model: linear_reg()
    #> 
    #> ── Preprocessor ────────────────────────────────────────────────────────────────
    #> 0 Recipe Steps
    #> 
    #> ── Model ───────────────────────────────────────────────────────────────────────
    #> 
    #> Call:  stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)
    #> 
    #> Coefficients:
    #>                                          (Intercept)  
    #>                                           -2.583e+07  
    #>                  MS_SubClassOne_Story_1945_and_Older  
    #>                                            7.419e+03  
    #>    MS_SubClassOne_Story_with_Finished_Attic_All_Ages  
    #>                                            1.562e+04  
    #>    MS_SubClassOne_and_Half_Story_Unfinished_All_Ages  
    #>                                            1.060e+04  
    #>      MS_SubClassOne_and_Half_Story_Finished_All_Ages  
    #>                                            8.413e+03  
    #>                  MS_SubClassTwo_Story_1946_and_Newer  
    #>                                            3.007e+03  
    #>                  MS_SubClassTwo_Story_1945_and_Older  
    #>                                            1.793e+04  
    #>               MS_SubClassTwo_and_Half_Story_All_Ages  
    #>                                           -3.909e+03  
    #>                       MS_SubClassSplit_or_Multilevel  
    #>                                           -1.098e+04  
    #>                               MS_SubClassSplit_Foyer  
    #>                                           -4.038e+03  
    #>                MS_SubClassDuplex_All_Styles_and_Ages  
    #>                                           -2.004e+04  
    #>              MS_SubClassOne_Story_PUD_1946_and_Newer  
    #>                                           -2.335e+04  
    #>           MS_SubClassOne_and_Half_Story_PUD_All_Ages  
    #>                                           -2.482e+04  
    #>              MS_SubClassTwo_Story_PUD_1946_and_Newer  
    #>                                           -1.794e+04  
    #>          MS_SubClassPUD_Multilevel_Split_Level_Foyer  
    #>                                           -2.098e+04  
    #> MS_SubClassTwo_Family_conversion_All_Styles_and_Ages  
    #>                                            6.903e+03  
    #>                    MS_ZoningResidential_High_Density  
    #>                                           -3.853e+03  
    #>                     MS_ZoningResidential_Low_Density  
    #>                                           -3.661e+03  
    #>                  MS_ZoningResidential_Medium_Density  
    #>                                           -8.240e+03  
    #>                                       MS_ZoningA_agr  
    #>                                           -3.824e+03  
    #>                                       MS_ZoningC_all  
    #>                                           -1.800e+04  
    #>                                       MS_ZoningI_all  
    #>                                           -3.299e+04  
    #>                                         Lot_Frontage  
    #>                                            1.336e+01  
    #> 
    #> ...
    #> and 506 more lines.
    

    Created on 2022-09-28 by the reprex package (v2.0.1)

    Hope this helps!

    Simon, tidymodels team