I have a RandomForest model trained with the caret package that contains both numeric and categorical predictors. I am attempting to use this trained model to make predictions on a new dataset that is a rasterStack containing one layer for each predictor. I have converted the categorical raster layer to a factor using the ratify
function in the raster
package, as well as added character strings corresponding to the training set syntax by adding a raster attribute table (RAT), but when I predict I am getting the following error:
# Error in predict.randomForest(modelFit, newdata) :
# Type of predictors in new data do not match that of the training data.
I think I might be mis-formulating the RAT somehow, or else I am misunderstanding the functionality of the RAT. Below is a minimal reproducible example. Any thoughts on what is going wrong?
require(caret)
require(raster)
set.seed(150)
data("iris")
# Training dataset
iris.x<-iris[,1:4]
iris.x$Cat<-"Low"
iris.x$Cat[1:60]<-"High"
iris.x$Cat<-as.factor(as.character(iris.x$Cat))
iris.y<-iris$Species
# Train RF model in Caret
ctrl<-trainControl("cv", num=5, p = 0.9)
mod<- train(iris.x,iris.y,
method="rf",
trControl=trainControl(method = "cv"))
# Create raster stack prediction dataset
r <- raster(ncol=10, nrow=5)
tt <- sapply(1:4, function(x) setValues(r, round(runif(ncell(r),1,5))))
#Categorical raster layer with RAT
r_cat<-raster(ncol=10, nrow=5)
r_cat[1:25]<-1
r_cat[26:50]<-2
ratr_cat <- ratify(r_cat)
rat <- levels(ratr_cat)[[1]]
rat$PCN <- c(1,2)
rat$PCN_level <- c('Low','High')
levels(ratr_cat) <- rat
#Stack raster layers
t.stack <- stack(c(tt,ratr_cat),RAT = TRUE)
#Make sure names in stack match training dataset
names(t.stack)<-c('Sepal.Length','Sepal.Width', 'Petal.Length', 'Petal.Width','Cat')
#Ensure that categorical layer still has RAT and is a factor
t.stack[['Cat']] #yep
is.factor(t.stack[['Cat']]) #yep
#Predict new data using model
mod_pred <- predict(t.stack, mod)
The factor RasterLayer
(Attribute Layer) seems to be (or be handled like) an ordered factor. So you just have to train the model with an ordered vector. You can achieve this changing one line:
iris.x$Cat<- ordered(as.character(iris.x$Cat), levels = c("Low", "High"))