Search code examples
rrasterpredictr-rasterrpart

Raster predict function changes factor to numeric and gives error


I am using the Biomod2 package to run a series of species distribution models in R. One of the modelling techniques I am using is a classification tree analysis (CTA) which uses the rpart package.

The response in these models are presence/absence of a plant species and the predictor variables are contained in a rasterStack. Most of the variables in the rasterStack are continuous numeric variables with the exception of one land cover variables, geology, which is a factor. I stacked each individual rasterLayer and after, used as.factor() to convert the geology layer to a factor.

I am running into an error message when trying to predict from the CTA. The CTA model was built with a data frame in which "geology" is a factor (see below) and used the raster predict function on a rasterStack ("geology" is a factor, see below). However, running the predict function, I get an error saying I supplied a numeric instead of a factor. I have checked all possible points to see if somehow "geology" get converted back to numeric but it seems to be a factor (as it should) everywhere I look.

EDIT: Changed data to make reproduceable.

library(raster)
library(rpart)

set.seed(123)

# Create sample rasterStack
data.rast <- stack(system.file("external/rlogo.grd", package = "raster"))
# Create one layer as a factor 
data.rast$geology <- as.factor(sampleInt(7, length(data.rast$red), replace = TRUE))

# Create sample presence/absence data by randomly selecting cells of raster
data <- as.data.frame(data.rast)
data <- data[sample(nrow(data), 300, replace = FALSE), ]
data$pa <- as.factor(sample(0:1, nrow(data), replace = TRUE))
names(data)[4] <- "geology"

head(data)
#     red green blue geology pa
#2463 251   255  255       7  1
#1944 191   190  186       5  0
#5016 162   174  226       7  0
#5771 255   255  253       4  1
#3739 204   205  199       7  0
#5483 131   133  122       3  0

# Build CTA model using presence/absence dataframe
# Parameters set as the defaults in Biomod2 modeling options
cta <- rpart(pa ~ .,
              data = data,
              na.action = na.omit,
              method = "class",
              control = list(xval = 5, 
                             minbucket = 5, 
                             minsplit = 5,
                             cp = 0.001,
                             maxdepth = 25))

# Confirm classes of data before running predict function
data.frame(ctaClass = attr(terms(cta), "dataClasses")[2:5],
            rasterFactor = is.factor(data.rast))
#        ctaClass rasterFactor
#red      numeric        FALSE
#green    numeric        FALSE
#blue     numeric        FALSE
#geology   factor         TRUE

# Once again confirming this rasterLayer is a factor
levels(data.rast$geology)
#[[1]]
#  ID VALUE
#1  1     1
#2  2     2
#3  3     3
#4  4     4
#5  5     5
#6  6     6
#7  7     7

# Run predict function on rasterStack
cta.predict <- predict(object = data.rast, 
                        model = cta,
                        type = "class")
#Error: variable 'geology' was fitted with type "factor" but type "numeric" was #supplied
#In addition: Warning message:
#In model.frame.default(Terms, newdata, na.action = na.action, xlev = #attr(object,  :
#  variable 'geology' is not a factor

EDIT: added proof that it works with a randomForests model

library(randomForest)
rf <- randomForest(pa ~ .,
                    data = data,
                    na.action = na.omit)
rf.predict <- predict(data.rast, rf)        

rf.predict
#class      : RasterLayer 
#dimensions : 77, 101, 7777  (nrow, ncol, ncell)
#resolution : 1, 1  (x, y)
#extent     : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#crs        : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#source     : memory
#names      : layer 
#values     : 0, 1  (min, max)
#attributes :
# ID value
#  1     0
#  2     1

Solution

  • In this case you need to help predict a bit by providing the factor name(s) and levels

    data$geology <- as.factor(data$geology)
    cta.predict <- predict(data.rast, cta, type="class", factors=list(geology=levels(data$geology)))
    

    Also note the type= in type=class, you should cannot just do class (unless you want the filename to be class.grd)

    With terra this works a little better, I think (hope)

    library(terra)
    x <- rast(data.rast*1)
    x$geology <- as.factor(x$geology)
    cta.predict <- predict(x, cta, type="class")