Search code examples
rmlr3

How to perform spatial crossvalidation using mlr3 and then perform raster predict


I have the following problem. I want to build a model for landcover classification. My data are multitemporal Remote Sensing data with several bands. For training I created stratified randomly distributed points to extract spectral data at their positions. With these data a Random Forrest (Rpart) was trained using mlr3 package. For accuracy measurement a repeated spatial cross validation using mlr3spatiotempcv was performed. The resulting model of the training step is, after extraction, stored in an R Object of type rpart. In the terms field of this object are the variable names stored. These are all my used bands but also the spatial x and y coordinates. This brings problems when predicting new data. I used terra package and got an error the x and y layer are missing in my input data. Which kind of makes sense because they are stored in the terms field of the model. But from my understanding, the coordinates should not be a variable of the model. The coordinates are just used for spatial resampling and not for predicting. I "solved" this problem by removing x and y coordinates during the training process and perform just an ordinary non-spatial cross validation. After that I performed the prediction and it works perfectly.

So, my Question is, how can I train a model, using mlr3 package, with data containing coordinates, to perform spatial cross validation?, and then use this model to predict a new Raster.


Solution

  • You have found a bug. When the task is created from a data.frame instead of an sf object, coords_as_features is set to TRUE. The default should be FALSE. You can install a fixed version of the package with remotes::install_github("mlr-org/mlr3spatiotempcv"). This fix should be included in the next CRAN version soon. Thanks for reporting.

    This brings problems when predicting new data.

    Why do you use the models from resampling to predict new data? Usually, you estimate the performance of the final model with (spatial) cross validation but the final model to predict new data is fitted on the complete data set.