Search code examples
rtidymodelsplumbervetiver

Why does deploying a tidymodel with vetiver throw a error when there's a variable with role as ID?


I'm unable to deploy a tidymodel with vetiver and get a prediction when the model includes a variable with role as ID in the recipe. See the following error in the image:

{ "error": "500 - Internal server error", "message": "Error: The following required columns are missing: 'Fake_ID'.\n" }

The code for the dummy example is below. Do I need to remove the ID-variable from both the model and recipe to make the Plumber API work?

#Load libraries
library(recipes)
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)



#Upload data
data(Sacramento, package = "modeldata")


#Create fake IDs for testing
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)


# Train model
Sacramento_recipe <- recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, data = Sacramento) %>% 
  update_role(Fake_ID, new_role = "ID") %>% 
  step_zv(all_predictors())

rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")

rf_fit <-
  workflow() %>%
  add_model(rf_spec) %>%
  add_recipe(Sacramento_recipe) %>%
  fit(Sacramento)


# Create vetiver object
v <- vetiver::vetiver_model(rf_fit, "sacramento_rf")
v


# Allow for model versioning and sharing
model_board <- board_temp()
model_board %>% vetiver_pin_write(v)


# Deploying model
pr() %>%
  vetiver_api(v) %>%
  pr_run(port = 8088)

Running the example of the Plumber API


Solution

  • As of today, vetiver looks for the "mold" workflows::extract_mold(rf_fit) and only get the predictors out to create the ptype. But then when you predict from a workflow, it does require all the variables, including non-predictors. If you have trained a model with non-predictors, as of today you can make the API work by passing in a custom ptype:

    library(recipes)
    #> Loading required package: dplyr
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    #> 
    #> Attaching package: 'recipes'
    #> The following object is masked from 'package:stats':
    #> 
    #>     step
    library(parsnip)
    library(workflows)
    library(pins)
    library(plumber)
    library(stringi)
    
    data(Sacramento, package = "modeldata")
    Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)
    
    
    Sacramento_recipe <- 
        recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, 
               data = Sacramento) %>% 
        update_role(Fake_ID, new_role = "ID") %>% 
        step_zv(all_predictors())
    
    rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")
    
    rf_fit <-
        workflow() %>%
        add_model(rf_spec) %>%
        add_recipe(Sacramento_recipe) %>%
        fit(Sacramento)
    
    
    library(vetiver)
    ## this is probably easiest because this model uses a simple formula
    ## if there is more complex preprocessing, select the variables
    ## from `Sacramento` via dplyr or similar
    sac_ptype <- extract_recipe(rf_fit) %>% 
        bake(new_data = Sacramento, -all_outcomes()) %>% 
        vctrs::vec_ptype()
    
    v <- vetiver_model(rf_fit, "sacramento_rf", save_ptype = sac_ptype)
    v
    #> 
    #> ── sacramento_rf ─ <butchered_workflow> model for deployment 
    #> A ranger regression modeling workflow using 6 features
    
    pr() %>%
        vetiver_api(v)
    #> # Plumber router with 2 endpoints, 4 filters, and 0 sub-routers.
    #> # Use `pr_run()` on this object to start the API.
    #> ├──[queryString]
    #> ├──[body]
    #> ├──[cookieParser]
    #> ├──[sharedSecret]
    #> ├──/ping (GET)
    #> └──/predict (POST)
    

    Created on 2022-03-10 by the reprex package (v2.0.1)

    Are you training models for production with non-predictor variables? Would you mind opening an issue on GitHub to explain your use case a little more?