I am using the Biomod2
package to run a series of species distribution models in R. One of the modelling techniques I am using is a classification tree analysis (CTA) which uses the rpart
package.
The response in these models are presence/absence of a plant species and the predictor variables are contained in a rasterStack
. Most of the variables in the rasterStack
are continuous numeric variables with the exception of one land cover variables, geology, which is a factor. I stacked each individual rasterLayer
and after, used as.factor()
to convert the geology layer to a factor.
I am running into an error message when trying to predict from the CTA. The CTA model was built with a data frame in which "geology" is a factor (see below) and used the raster predict
function on a rasterStack
("geology" is a factor, see below). However, running the predict
function, I get an error saying I supplied a numeric instead of a factor. I have checked all possible points to see if somehow "geology" get converted back to numeric but it seems to be a factor (as it should) everywhere I look.
EDIT: Changed data to make reproduceable.
library(raster)
library(rpart)
set.seed(123)
# Create sample rasterStack
data.rast <- stack(system.file("external/rlogo.grd", package = "raster"))
# Create one layer as a factor
data.rast$geology <- as.factor(sampleInt(7, length(data.rast$red), replace = TRUE))
# Create sample presence/absence data by randomly selecting cells of raster
data <- as.data.frame(data.rast)
data <- data[sample(nrow(data), 300, replace = FALSE), ]
data$pa <- as.factor(sample(0:1, nrow(data), replace = TRUE))
names(data)[4] <- "geology"
head(data)
# red green blue geology pa
#2463 251 255 255 7 1
#1944 191 190 186 5 0
#5016 162 174 226 7 0
#5771 255 255 253 4 1
#3739 204 205 199 7 0
#5483 131 133 122 3 0
# Build CTA model using presence/absence dataframe
# Parameters set as the defaults in Biomod2 modeling options
cta <- rpart(pa ~ .,
data = data,
na.action = na.omit,
method = "class",
control = list(xval = 5,
minbucket = 5,
minsplit = 5,
cp = 0.001,
maxdepth = 25))
# Confirm classes of data before running predict function
data.frame(ctaClass = attr(terms(cta), "dataClasses")[2:5],
rasterFactor = is.factor(data.rast))
# ctaClass rasterFactor
#red numeric FALSE
#green numeric FALSE
#blue numeric FALSE
#geology factor TRUE
# Once again confirming this rasterLayer is a factor
levels(data.rast$geology)
#[[1]]
# ID VALUE
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
# Run predict function on rasterStack
cta.predict <- predict(object = data.rast,
model = cta,
type = "class")
#Error: variable 'geology' was fitted with type "factor" but type "numeric" was #supplied
#In addition: Warning message:
#In model.frame.default(Terms, newdata, na.action = na.action, xlev = #attr(object, :
# variable 'geology' is not a factor
EDIT: added proof that it works with a randomForests
model
library(randomForest)
rf <- randomForest(pa ~ .,
data = data,
na.action = na.omit)
rf.predict <- predict(data.rast, rf)
rf.predict
#class : RasterLayer
#dimensions : 77, 101, 7777 (nrow, ncol, ncell)
#resolution : 1, 1 (x, y)
#extent : 0, 101, 0, 77 (xmin, xmax, ymin, ymax)
#crs : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
#source : memory
#names : layer
#values : 0, 1 (min, max)
#attributes :
# ID value
# 1 0
# 2 1
In this case you need to help predict
a bit by providing the factor name(s) and levels
data$geology <- as.factor(data$geology)
cta.predict <- predict(data.rast, cta, type="class", factors=list(geology=levels(data$geology)))
Also note the type=
in type=class
, you should cannot just do class
(unless you want the filename
to be class.grd
)
With terra
this works a little better, I think (hope)
library(terra)
x <- rast(data.rast*1)
x$geology <- as.factor(x$geology)
cta.predict <- predict(x, cta, type="class")