Search code examples
rregressionlogistic-regressionspatialautocorrelation

Accounting for Spatial Autocorrelation in Model


I am trying to account for spatial autocorrelation in a model in R. Each observation is a country for which I have the average latitude and longitude. Here's some sample data:

country <- c("IQ", "MX", "IN", "PY")
long <- c(43.94511, -94.87018, 78.10349, -59.15377)
lat <- c(33.9415073, 18.2283975, 23.8462264, -23.3900255)
Pathogen <- c(10.937891, 13.326284, 12.472374, 12.541716)
Answer.values <- c(0, 0, 1, 0)

data <- data.frame(country, long, lat, Pathogen, Answer.values)

I know spatial autocorrelation is an issue (Moran's i is significant in the whole dataset). This is the model I am testing (Answer Values (a 0/1 variable) ~ Pathogen Prevalence (a continuous variable)).

model <- glm(Answer.values ~ Pathogen,
             na.action = na.omit,
             data = data,
             family = "binomial")

How would I account for spatial autocorrelation with a data structure like that?


Solution

  • There are a lot of potential answers to this. One easy(ish) way is to use mgcv::gam() to add a spatial smoother. Most of your model would stay the same:

    library(mgcv)
    gam(Answer.values ~ Pathogen +s([something]),
        family="binomial",
        data=data)
    

    where s([something]) is some form of smooth spatial term. Three possible/reasonable choices would be:

    • a spherical spline (?mgcv::smooth.construct.sos.smooth.spec), which takes lat/long as input; this would be useful if (1) you have data over a significant fraction of the earth's surface (so that a smoother that constructs a 2D planar spatial smooth is less reasonable); (2) you want to account for distance between locations in a continuous way
    • a Markov random field (?mgcv::smooth.construct.mrf.smooth.spec). This is essentially the spatial analogue of a discrete order-1 autoregressive structure (i.e. countries are directly correlated only with their direct neighbours, however you choose to define that). In order to do this you have to come up somehow with a neighbourhood list (i.e. a list of countries, where the elements are lists of countries that are neighbours of the original countries). You could do this however you like, e.g. by finding nearest neighbours geographically. (Check out some introductions to spatial statistics/spatial data analysis in R.) (On the other hand, if you're testing Moran's I then you've presumably already come up with some way to identify first-order neighbours ...)
    • if you're comfortable treating lat/long as coordinates in a 2D plane, then you have lot of choices of smoothing basis, e.g. ?mgcv::smooth.construct.gp.smooth.spec (Gaussian process smoothers, which include most of the standard spatial autocorrelation models as special cases)

    A helpful link for getting up to speed with GAMs in R ...