Search code examples
rrasterpredictterragam

Create a logit GAM species prediction raster using Terra


I am trying to predict the occurrence of a species (presence/absence) using a GAM in R. I have fit the GAM, but I get an error when I try to make the prediction to spatial data with terra::predict(). I get this error:

# Error invalid name(s)

I also do not understand why my predictions aren't binary (ie 1/0 for presence/absence)?

library(terra)
library(mgcv)
library(sf)

#polygons to generate random points within
v <- vect(system.file("ex/lux.shp", package="terra")) |> st_as_sf()

#get a raster from terra 
r <- rast(system.file("ex/elev.tif", package="terra"))

# Create 50 random points
set.seed(50)
pnts <- st_sample(v, size = 50, type = "random") |> st_as_sf()

# Return the exact raster value the point lies on
pnts_r <-  terra::extract(r, pnts, method = "simple") 

# Add occurrence column
pnts_r$occ <- as.factor(sample(0:1, 50, replace = TRUE))

# Fit the GAM
mod_gam <- mgcv::gam(formula = occ ~ elevation, 
                     data = pnts_r,
                     family = binomial(link = "logit"),
                     method = "REML")

# Predict using our training data
predict(mod_gam) # Why aren't the responses 1s and 0s?

# Predict using the entire raster
terra::predict(mod_gam, r, fun = "predict", na.rm=TRUE) 
# Error invalid name(s)

I understand the variable names in the model need to be identical to those in the raster, but in this case there is only 1 ("elevation") and its identical in both.


Solution

  • You have the order of arguments wrong in predict.

    To use terra::predict the first argument should be a SpatRaster but you use the gam model. Since "predict" is a generic function, the implementation of predict is based on the class of the first argument, and this will take you to mgcv::predict.gam instead).

    Example data

    library(terra)
    library(mgcv)
    v <- vect(system.file("ex/lux.shp", package="terra"))
    r <- rast(system.file("ex/elev.tif", package="terra"))
    set.seed(50)
    pnts <- spatSample(v, size = 50)
    pnts_r <- terra::extract(r, pnts, method = "simple") 
    pnts_r$occ <- sample(0:1, 50, replace = TRUE)
    mod_gam <- mgcv::gam(occ ~ elevation, data = pnts_r, 
          family = binomial(link = "logit"), method = "REML")
    

    Test the model, use argument type="response" if you want probabilities

    p <- predict(mod_gam, type="response")
    head(p)
    #0.3965005 0.6287751 0.5908377 0.4722339 0.4477870 0.3605875 
    

    And now with the raster data; again with type="response".

    x <- terra::predict(r, mod_gam, type="response")
    
    x
    #class       : SpatRaster 
    #dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
    #resolution  : 0.008333333, 0.008333333  (x, y)
    #extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
    #coord. ref. : lon/lat WGS 84 (EPSG:4326) 
    #source(s)   : memory
    #varname     : elev 
    #name        : elevation 
    #min value   : 0.3231922 
    #max value   : 0.6545229 
    

    Inspect

    plot(x)
    # plot(x > .5)
    points(pnts, col="red")
    

    map

    Alternatively, you can keep the order you have, by naming the arguments:

    terra::predict(model=mod_gam, object=r, type="response")