Search code examples
rstatisticsclassificationspatstat

Point pattern classification with spatstat: what am I doing wrong?


I’am trying to classify bivariate point patterns into groups using spatstat. The patterns are derived from the whole slide images of lymph nodes with cancer. I’ve trained a neural network to recognize cells of three types (cancer “LP”, immune cells “bcell” and all other cells). I do not wish to analyse all other cells but use them to construct a polygonal window in the shape of the lymph node. Thus, the patterns to be analysed are immune cells and cancer cells in polygonal windows. Each pattern can have several 10k cancer cells and up to 2mio immune cells. The patterns are of the type “Small World Model” as there is no possibility of points laying outside the window.

My classification should be based on the position of the cancer cells in relation to the immune cells. E.g. most cancer cells are laying on the “islands” of immune cells but in some cases cancer cells are (seemingly) uniformly dispersed and there are only a few immune cells. In addition, the patterns are not always uniform across the node. As I’m rather new to spatial statistics I developed a simple and crude method to classify the patterns. Here in short:

  1. I calculated a kernel density of the immune cells with sigma=80 because this looked “nice” for me. Den<-density(split(cells)$"bcell",sigma=80,window= cells$window) (Should I have used e.g. sigma=bw.scott instead?)
  2. Then I created a tessellation image by dividing density range in 3 parts (here again, I experimented with the breaks to get some “good looking results”).
rangesDenMax<-2*range(Den)[2]/3
rangesDenMin<-range(Den)[2]/3
map.breaks<-c(-Inf,rangesDenMin,rangesDenMax,Inf)
map.cuts <- cut(Den, breaks = map.breaks, labels = c("Low B-cell density","Medium B-cell density", "High B-cell density"))
map.quartile <- tess(image = map.cuts,window=cells$window)
tessImage<-map.quartile

Here are some examples of the plots of the tessellations with the cancer cell overlay (white dots). The lymph node on the left has a typical uniformly distributed “islands” of immune cells while the node on the right has only a few dense spots of immune cells and cancer cells not restricted to those spots:

heat map: immune cell kernel density, white dots: cancer cells

  1. Then I measured a silly number of variables, which should give me a clue of how the cancer cells are distributed across the tessellation tiles (the calculation code is trivial so I post only the description of my variables):
LPlwB<-c() # proportion of cancer cells in low-b-cell-area 
LPmdB<-c() # proportion of cancer cells in medium-b-cell-area 
LPhiB<-c() # proportion of cancer cells in high-b-cell-area
AlwB<-c()  # proportion of the low-b-cell area
AmdB<-c()  # proportion of the medium-b-cell area
AhiB<-c()  # proportion of the high-b-cell area
LPm1<-c()  # mean distance to the 1st neighbour
LPm2<-c()  # mean distance to the 2nd neighbour
LPm3<-c()  # mean distance to the 3d neighbour
LPsd1<-c() # standard deviation of the mean distance to the 1st neighbour
LPsd2<-c() # standard deviation of the mean distance to the 2nd neighbour
LPsd3<-c() # standard deviation of the mean distance to the 3d neighbour
meanQ<-c() # mean quadratcount (I visually chose the quadrat size to be not too large and not too small)
sdevQ<-c() # standard deviation of the mean quadratcount
hiSAT<-c() # realised cancer cells saturation in high b-cell-area (number of cells observed divided by a number of cells, which could be fitted into the area considering the observed min distance between the cells)
mdSAT<-c() # realised cancer cells saturation in medium b-cell-area 
lwSAT<-c() # realised cancer cells saturation in low b-cell-area 
ll<-c() # Proportion LP neighbours of LP (contingency table count divided by total points) 
lb<-c() # Proportion b-cell neighbours of LP
bl<-c() # Proportion b-cell neighbours of b-cells
bb<-c() # Proportion LP neighbours of b-cells
  1. I z-scaled the variables, inspected them on a PCA-plot (the vectors pointed in different directions like needles of a sea urchin) and performed a hierarchical cluster analysis. I choose k by calculating fviz_nbclust(scaled_variables, hcut, method = "silhouette"). After dividing the dendrogram into k clusters and checking the cluster stability, I ended up with my groups, which seemed to make sense as cases with “islands” were separated from the "more dispersed" ones.

However, given the possibilities of the spatstat package I strongly feel like hitting nails into the wall with a smartphone.


Solution

  • It seems you are trying to quantify the way in which the cancer cells are positioned relative to the immune cells. You could do this by something like

    Cancer <- split(cells)[["LP"]]
    Immune <- split(cells)[["bcell"]]
    Dimmune <- density(Immune, sigma=80)
    f <- rhohat(Cancer, Dimmune)
    plot(f)
    

    Then f is a function that indicates the intensity (number per unit area) of cancer cells as a function of the density of immune cells. The plot shows the density of cancer cells on the vertical axis, against the density of immune cells on the horizontal axis.

    If the graph of this function is flat, it means that the cancer cells are not paying attention to the density of immune cells. If the graph is steeply declining it means that cancer cells tend to avoid immune cells.

    I suggest you first look at the plot of f for some example datasets to decide whether f has any ability to discriminate between spatial arrangements that you think should be classified as different. If so then you can use as.data.frame to extract the values of f and then use classical discriminant analysis (etc) to classify the slide images into groups.

    Instead of density(Immune) you could use any other summary of the immune cells. For example D <- distfun(Immune) would give you the distance to the nearest immune cell, and then f would compute the density of cancer cells as a function of the distance to nearest immune cell. And so on.