I am trying to predict the occurrence of a species (presence/absence) using a GAM in R. I have fit the GAM, but I get an error when I try to make the prediction to spatial data with terra::predict()
. I get this error:
# Error invalid name(s)
I also do not understand why my predictions aren't binary (ie 1/0 for presence/absence)?
library(terra)
library(mgcv)
library(sf)
#polygons to generate random points within
v <- vect(system.file("ex/lux.shp", package="terra")) |> st_as_sf()
#get a raster from terra
r <- rast(system.file("ex/elev.tif", package="terra"))
# Create 50 random points
set.seed(50)
pnts <- st_sample(v, size = 50, type = "random") |> st_as_sf()
# Return the exact raster value the point lies on
pnts_r <- terra::extract(r, pnts, method = "simple")
# Add occurrence column
pnts_r$occ <- as.factor(sample(0:1, 50, replace = TRUE))
# Fit the GAM
mod_gam <- mgcv::gam(formula = occ ~ elevation,
data = pnts_r,
family = binomial(link = "logit"),
method = "REML")
# Predict using our training data
predict(mod_gam) # Why aren't the responses 1s and 0s?
# Predict using the entire raster
terra::predict(mod_gam, r, fun = "predict", na.rm=TRUE)
# Error invalid name(s)
I understand the variable names in the model need to be identical to those in the raster, but in this case there is only 1 ("elevation") and its identical in both.
You have the order of arguments wrong in predict
.
To use terra::predict
the first argument should be a SpatRaster but you use the gam model. Since "predict" is a generic function, the implementation of predict
is based on the class of the first argument, and this will take you to mgcv::predict.gam
instead).
Example data
library(terra)
library(mgcv)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
set.seed(50)
pnts <- spatSample(v, size = 50)
pnts_r <- terra::extract(r, pnts, method = "simple")
pnts_r$occ <- sample(0:1, 50, replace = TRUE)
mod_gam <- mgcv::gam(occ ~ elevation, data = pnts_r,
family = binomial(link = "logit"), method = "REML")
Test the model, use argument type="response"
if you want probabilities
p <- predict(mod_gam, type="response")
head(p)
#0.3965005 0.6287751 0.5908377 0.4722339 0.4477870 0.3605875
And now with the raster data; again with type="response"
.
x <- terra::predict(r, mod_gam, type="response")
x
#class : SpatRaster
#dimensions : 90, 95, 1 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : 5.741667, 6.533333, 49.44167, 50.19167 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (EPSG:4326)
#source(s) : memory
#varname : elev
#name : elevation
#min value : 0.3231922
#max value : 0.6545229
Inspect
plot(x)
# plot(x > .5)
points(pnts, col="red")
Alternatively, you can keep the order you have, by naming the arguments:
terra::predict(model=mod_gam, object=r, type="response")