Search code examples
tidyversexgboostmulticlass-classificationtidymodels

Multiclass tidymodel - class of outcome variable?


I want to do multiclass classification, and my y-variable is "character", three levels ("CD", "UC", "IBS")

How can I transform my y-variable into a factor/something the model will accept?

My model code:

  boost_tree(trees=50) %>%
  set_engine("xgboost") %>%
  set_mode("classification") %>%
  fit(diagnosis ~ ., data=train)

Error in check_outcome(): ! For a classification model, the outcome should be a factor. Backtrace:

  1. ... %>% fit(diagnosis ~ ., data = train)
  2. parsnip::fit.model_spec(., diagnosis ~ ., data = train)
  3. parsnip:::form_xy(...)
  4. parsnip:::check_outcome(env$y, object)

Thanks a lot!


Solution

  • Before you do anything else (like data splitting or resampling), you can make it a factor via

    train$diagnosis <- factor(train$diagnosis)
    

    See the help files; there are other options that you can set such as the order of the factor levels and so on.