Search code examples
rxgboostr-raster

Using parsnip model to predict a raster in R


I've been trying to predict a raster in R using an XGBoost model. I need to use raster::predict() because of the raster size. raster::predict(raster, xgboost_model, type="prob") and raster::predict(raster, xgboost_model, type="raw") work fine. But when I try to predict the classes, which is what I want to do using raster::predict(raster, xgboost_model, type="class"), I get an error:

> predicted<-raster::predict(raster, xgboost_model,  type="class")
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix

Here's a reproducible example using tidymodels which is what I used to train my model. Just in case this is tidymodel specific.

library(raster)
library(tidymodels)
library(tidyverse)

## Make dummy raster with class as class to predict.
band1<-raster(ncol=10, nrow=10)
values(band1)<-runif(ncell(band1))

band2<-raster(ncol=10, nrow=10)
values(band2)<-runif(ncell(band2))

band3<-raster(ncol=10, nrow=10)
values(band3)<-runif(ncell(band3))

class<-raster(ncol=10, nrow=10)
values(class)<-floor(runif(ncell(class), min=1, max=5))


r<-stack(band1, band2, band3, class)

names(r)<-c("band1", "band2", "band3", "class")

## Convert raster to df for training. 
train<-getValues(r)%>%
  as_tibble()

## Tune and train model.
xgb_spec<-boost_tree(
  trees=50,
  tree_depth = tune(),
  min_n=tune(),
  loss_reduction=tune(),
  sample_size=tune(),
  mtry=tune(),
  learn_rate=tune()
)%>%
  set_engine("xgboost")%>%
  set_mode("classification")

xgb_grid<-grid_latin_hypercube(
  tree_depth(),
  min_n(),
  loss_reduction(),
  sample_size=sample_prop(),
  finalize(mtry(), select(train, -class)),
  learn_rate(),
  size=5
)

xgb_wf<-workflow()%>%
  add_formula(as.factor(class)~band1+band2+band3)%>%
  add_model(xgb_spec)

folds <- vfold_cv(train, v = 5)

xgb_res<-tune_grid(
  xgb_wf,
  resamples=folds,
  grid=xgb_grid,
  control=control_grid(save_pred=T)
)

best_auc<-select_best(xgb_res, "roc_auc")

final_xgb<-finalize_workflow(
  xgb_wf,
  best_auc
)

last_fit<-fit(final_xgb, train)

## remove class layer for test to simulate real world example
test<-r%>%
  dropLayer(4)

## This works
raster::predict(test, last_fit, type="prob")
## This doesn't
raster::predict(test, last_fit, type="class")

Error produced for type="class" is:

> raster::predict(test, last_fit, type="class")
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix

I've googled my face off and the only way I've figure out how to predict class is to convert the raster to matrix and then add the predictions back into the raster. But this is really, really slow.

Thanks in advance.


Solution

  • Aha. I figured it out. The problem is that a model produced by the parsnip package always returns a tibble when the prediction type is type="class". raster.predict expects a matrix to be returned. You can get around this by providing a function to raster.predict that converts the returned parsnip::predicted model to a matrix.

    Here's how I predicted a raster using my original model created in my question:

    fun<-function(...){
      p<-predict(...)
      return(as.matrix(as.numeric(p[, 1, drop=T]))) 
    }
    
    raster::predict(test, last_fit, type="class", fun=fun)