I'm creating and fitting a workflow for a lasso regression model in {tidymodels}. The model fits fine, but when I go to predict the test set I get an error saying "the following required column is missing from `new_data`". Tha column ("price") is in both the train and test sets. Is this a bug? What am I missing?
Any help would be greatly appreciated.
# split the data (target variable in house_sales_df is "price")
split <- initial_split(house_sales_df, prop = 0.8)
train <- split %>% training()
test <- split %>% testing()
# create and fit workflow
lasso_prep_recipe <-
recipe(price ~ ., data = train) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric())
lasso_model <-
linear_reg(penalty = 0.1, mixture = 1) %>%
set_engine("glmnet")
lasso_workflow <- workflow() %>%
add_recipe(lasso_prep_recipe) %>%
add_model(lasso_model)
lasso_fit <- lasso_workflow %>%
fit(data = train)
# predict test set
predict(lasso_fit, new_data = test)
predict()
results in this error:
Error in `step_normalize()`:
! The following required column is missing from `new_data` in step 'normalize_MXQEf': price.
Backtrace:
1. stats::predict(lasso_fit, new_data = test, type = "numeric")
2. workflows:::predict.workflow(lasso_fit, new_data = test, type = "numeric")
3. workflows:::forge_predictors(new_data, workflow)
5. hardhat:::forge.data.frame(new_data, blueprint = mold$blueprint)
7. hardhat:::run_forge.default_recipe_blueprint(...)
8. hardhat:::forge_recipe_default_process(...)
10. recipes:::bake.recipe(object = rec, new_data = new_data)
12. recipes:::bake.step_normalize(step, new_data = new_data)
13. recipes::check_new_data(names(object$means), object, new_data)
14. cli::cli_abort(...)
You are getting the error because all_numeric()
in step_normalize()
selects the outcome price
which isn't avaliable at predict time. Use all_numeric_predictors()
and you should be good
# split the data (target variable in house_sales_df is "price")
split <- initial_split(house_sales_df, prop = 0.8)
train <- split %>% training()
test <- split %>% testing()
# create and fit workflow
lasso_prep_recipe <-
recipe(price ~ ., data = train) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors())
lasso_model <-
linear_reg(penalty = 0.1, mixture = 1) %>%
set_engine("glmnet")
lasso_workflow <- workflow() %>%
add_recipe(lasso_prep_recipe) %>%
add_model(lasso_model)
lasso_fit <- lasso_workflow %>%
fit(data = train)
# predict test set
predict(lasso_fit, new_data = test)