Search code examples
spatstat

'spatstat' package: Fitting cluster process models


I have several questions regarding fitting cluster process models using the spatstat package, to solve a problem I am working on.

  1. Is there a recommended minimum sample size consideration when fitting a cluster process model to get reliable estimates for the model parameters?

  2. If it is known that the point pattern is inhomogeneous and the intensity varies with x and y, is it correct to include x and y as an interaction term in the model? In my scenario I think a modified Thomas model would be ideal. So will the following be correct in implementing this?

kppm(ppdata ~ x*y, clusters ="Thomas", method = "palm", statistic = "Kinhom")

Although cluster intensity and sibling probability change in using this, the scale parameter was estimated as the same value as the homogenous case below.

kppm(ppdata ~ 1, clusters ="Thomas", method = "palm", statistic = "K")

Thanks in advance!


Solution

  • Is there a recommended minimum sample size consideration when fitting a cluster process model to get reliable estimates for the model parameters?

    This is a topic of current research. The answer depends on the strength and range of clustering (as well as on the definition of "reliable" estimates). Most of the standard test case examples contain 100 to 200 points.

    If it is known that the point pattern is inhomogeneous and the intensity varies with x and y, is it correct to include x and y as an interaction term in the model? In my scenario I think a modified Thomas model would be ideal. So will the following be correct in implementing this? kppm(ppdata ~ x*y, clusters ="Thomas", method = "palm", statistic = "Kinhom")

    The argument statistic="Kinhom" is only relevant when method="mincontrast" and it is ignored when method="palm".

    In a formula, x*y means x + y + x:y. When the covariates x and y are numeric, x:y means xy. So the formula ppdata~x*y specifies that the intensity function takes the form lambda(x,y) = exp(a + bx + cy + dxy) where a,b,c,d are coefficients that will be estimated. If you're including the term xy you may as well include all the terms of order 2, using the formula ppdata~polynom(x,y,2) which specifies lambda(x,y) = exp(a + bx + cy + dxy + ex^2 + fy^2).

    In general terms, if the intensity depends on spatial location then you can either assume that the intensity follows a particular functional form (by specifying a model formula) or estimate the intensity nonparametrically, by kernel smoothing or similar methods. If there's no additional information about the form of the intensity then there's no right or wrong way to estimate it.

    You could alternatively use a nonparametric approach, first estimating the intensity by kernel smoothing (for example) as

      LambdaX <- density(ppdata, bw.ppp, at="points", leaveoneout=TRUE, positive=TRUE)
    

    Then

       K <- Kinhom(ppdata, LambdaX)
       M <- thomas.estK(K)
    

    Then M gives the fitted cluster parameters of the inhomogeneous Thomas process. Alternatively if you want to have a kppm object,

      Lambda <- density(ppdata, bw.ppp, positive=TRUE)
      M <- kppm(ppdata ~ offset(log(Lambda)), clusters="Thomas", method="palm")
    

    Finally a word of caution. In all of the above examples, a particular kind of inhomogeneity is assumed, which allows very convenient model-fitting. Simply knowing that the pattern is inhomogeneous does not guarantee that it has this particular, convenient type of inhomogeneity. For further information see Chapter 12 of the spatstat book.