The Clark-Evans index is one of the most basic statistics to measure point aggregation in spatial analysis. However, I can't find any implementation in Python. So I adapted the R code from the hyperlink above. I want to ask if the statistic and p-value are correct with such irregular study areas:
import numpy as np
import os, math
from shapely.geometry import Polygon, Point
from sklearn.neighbors import KDTree
from statistics import NormalDist
def clarkEvans (X, Y, roi):
""" Clark evans index takes point x,y coordinates and a polygon for cell shape (roi) and outputs Clark-Evans index:
R ~1 suggests spatial randomness, while R<<1 suggests clustering and R>>1 suggests ordering"""
# Import cell boundries from roi file
pgon = Polygon(roi)
# Calculate intensity from points/area
areaW = pgon.area
npts = len(X)
intensity = npts/areaW
if npts <2:
tmp_df = list(zip(X, Y))
# Get nearest neighbours for each observation
kdt = KDTree(tmp_df, leaf_size=30, metric='euclidean') # This is very good for large datasets, but maybe bad for small ones?
dists, ids = kdt.query(tmp_df, k=2)
dists = [x[1] for x in dists]
# Clark-Evans Index (mean NN distances/mean NN distances under poisson)
Dobs = np.mean(dists)
Dpoison = 1/(2 * math.sqrt(intensity))
Rnaive = Dobs/Dpoison
# Calculate p-value under normal distribution
SE = math.sqrt(((4 - math.pi) * areaW)/(4 * math.pi))/npts
Z = (Dobs - Dpoison)/SE
# Diff between observed and expected NN distances should have Normal Distribution according to Central Limit Theorem (CLT)
p_val = NormalDist().pdf(Z) # p_val for clustering
# Return the ClarkEvans Index and p-value
return(round(Rnaive,3), round(p_val, 3))
In the image is my Clark-Evans Index being applied to and plotted with two different datasets. The index is similar for both patterns, one of which seems more obviously clustered. The p-values seem switched, I would think the second plot would have a significant p-val, being the clustered one.
# The xy coordinates of observations plus the point vertices of the study area (roi)
x1 = [123, 105, 71, 109, 96, 49, 86, 80, 120, 98, 59, 100, 118, 69, 84, 21, 95, 77, 158, 118, 87, 77, 87, 77, 82, 106, 120, 125, 61, 24, 53, 106, 52, 103, 89, 99, 111, 58, 97, 83, 51, 45, 64, 112, 114, 73, 55, 111, 110, 102, 116, 107, 84, 97, 118, 96, 116, 45, 102, 145, 126, 50, 103, 98, 20, 79, 113, 99, 90, 143, 36, 120, 106, 91, 95, 15, 122, 69, 28, 71, 66, 119, 78, 75, 113, 44, 85, 60, 88, 68, 116, 40, 59, 105, 65, 94, 79, 95, 120, 67, 78, 59, 89, 84, 111, 78, 72, 156, 162, 134, 157, 120, 126, 86, 58, 137, 32, 91, 68, 119, 112, 70, 120, 62, 118, 114, 66, 55, 99, 72, 91, 109, 53, 94, 71, 145, 146, 106, 15, 83, 104, 61, 129, 51, 58, 59, 113, 107, 94, 94, 69, 118, 74, 124, 107, 99, 66, 115, 159, 71, 115, 122, 76, 68, 79, 107, 81, 104, 87, 106, 105, 112, 111, 79, 54, 108, 62, 115, 36, 74, 84, 75, 64, 92, 64, 82, 77, 56, 75, 69, 88, 105, 96, 61, 84, 106, 31, 53, 173, 102, 99, 124, 87, 70, 25, 19, 122, 101, 126, 60, 94, 78, 97, 64, 45, 92, 114, 87, 96, 160, 88, 66, 40, 124, 103, 60, 129, 120, 35, 95, 56, 76, 116, 65, 7, 103, 160, 63, 134, 101, 56, 50, 89, 92, 99, 89, 120, 47, 58, 47, 74, 124, 8, 93, 121, 53, 66, 63, 90, 114, 91, 71, 123, 55, 142, 97, 69, 141, 92, 76, 69, 74, 66, 90, 81, 96, 110, 61, 58, 62, 50, 125, 106, 115, 79, 94, 118, 117, 64, 99, 55, 53, 93, 57, 116, 61, 125, 10, 119, 74, 64, 77, 127, 115, 59, 53, 99, 81, 68, 101, 43, 122, 129, 109, 108, 84, 103, 59, 105, 76, 122, 101, 101, 108, 79, 75, 60, 111, 97, 104, 82, 67, 96, 70, 96, 104, 103, 66, 89, 114, 121, 119, 104, 93, 156, 108, 88, 98, 52, 112, 65, 99, 107, 90, 107, 115, 73, 106, 100, 120, 128, 66, 116, 69, 113, 69, 103, 62, 124, 110, 124, 72, 76, 115, 73, 84, 95, 100, 51, 61, 82, 97, 106, 68, 112, 69, 115, 67, 80, 72, 63, 123, 92, 101, 61, 69, 103, 112, 70, 59, 91, 90, 102, 111, 41, 101, 90, 33, 122, 161, 161]
y1 = [37, 51, 35, 67, 94, 114, 62, 24, 64, 92, 55, 11, 74, 38, 79, 77, 90, 77, 70, 70, 41, 46, 81, 83, 81, 65, 63, 43, 56, 95, 26, 8, 68, 82, 44, 78, 77, 72, 45, 68, 83, 99, 100, 58, 91, 89, 115, 34, 46, 68, 79, 71, 41, 43, 48, 83, 67, 69, 42, 55, 63, 69, 47, 67, 102, 72, 33, 77, 67, 1, 123, 59, 69, 47, 73, 79, 89, 48, 55, 97, 56, 92, 121, 70, 48, 47, 114, 62, 84, 78, 54, 55, 79, 76, 62, 63, 83, 71, 74, 83, 50, 67, 84, 81, 75, 59, 12, 77, 97, 6, 26, 55, 10, 74, 58, 59, 77, 76, 77, 68, 60, 50, 53, 89, 76, 87, 67, 86, 86, 73, 79, 74, 62, 54, 67, 58, 23, 76, 95, 63, 38, 76, 117, 18, 52, 46, 98, 62, 44, 36, 86, 52, 74, 51, 85, 100, 75, 73, 63, 38, 64, 91, 47, 70, 77, 88, 70, 88, 88, 39, 52, 45, 79, 56, 74, 60, 59, 69, 116, 44, 55, 48, 70, 83, 66, 87, 78, 73, 58, 76, 46, 50, 43, 81, 102, 45, 115, 88, 80, 34, 55, 55, 97, 103, 112, 122, 111, 97, 90, 81, 22, 36, 87, 86, 48, 39, 42, 83, 57, 16, 100, 89, 115, 75, 69, 86, 69, 69, 74, 39, 52, 23, 63, 49, 92, 96, 71, 105, 10, 75, 84, 80, 30, 30, 59, 52, 32, 119, 107, 74, 79, 101, 106, 99, 77, 66, 89, 83, 102, 94, 97, 78, 91, 93, 16, 11, 33, 16, 78, 50, 30, 26, 79, 34, 32, 86, 64, 40, 63, 51, 58, 52, 92, 98, 35, 36, 34, 47, 86, 88, 60, 80, 92, 96, 94, 94, 98, 111, 49, 54, 56, 36, 72, 94, 92, 102, 105, 32, 40, 30, 73, 59, 107, 39, 46, 40, 53, 57, 93, 92, 63, 59, 65, 68, 81, 69, 56, 53, 53, 85, 56, 55, 93, 45, 40, 68, 101, 93, 29, 44, 93, 93, 46, 67, 38, 34, 97, 93, 72, 90, 62, 68, 32, 31, 74, 71, 59, 38, 51, 95, 73, 82, 5, 53, 50, 34, 49, 43, 82, 77, 65, 88, 87, 89, 30, 38, 45, 36, 79, 89, 88, 100, 98, 45, 41, 20, 35, 51, 77, 64, 60, 63, 33, 44, 78, 82, 83, 70, 74, 78, 41, 61, 71, 40, 124, 82, 67, 121, 5, 65, 66]
roi1 = [[152.5078125, 3.7060546875], [158.8408203125, 12.455078125], [165.5126953125, 25.3154296875], [170.796875, 38.787109375], [171.013671875, 46.02734375], [172.6083984375, 53.0615234375], [172.6083984375, 63.9306640625], [174.419921875, 70.9169921875], [174.419921875, 85.41015625], [175.947265625, 92.4296875], [175.7998046875, 103.2626953125], [169.52734375, 116.3212890625], [166.9765625, 118.89453125], [159.7451171875, 119.2177734375], [152.7265625, 121.01953125], [138.2333984375, 121.029296875], [131.21875, 122.8408203125], [73.248046875, 122.8408203125], [66.2119140625, 124.5546875], [58.966796875, 124.65234375], [51.9990234375, 126.4638671875], [23.013671875, 126.4638671875], [19.42578125, 125.958984375], [16.5361328125, 123.7734375], [10.20703125, 115.0283203125], [0.5068359375, 95.57421875], [0.5537109375, 91.951171875], [9.0869140625, 80.318359375], [12.552734375, 73.9599609375], [18.884765625, 65.2119140625], [25.89453125, 56.994140625], [35.611328125, 41.7626953125], [42.345703125, 33.296875], [45.7568359375, 26.90625], [53.634765625, 14.7744140625], [58.1103515625, 9.078125], [64.916015625, 6.8984375], [86.654296875, 6.8984375], [93.6904296875, 5.291015625], [100.89453125, 4.57421875], [104.2763671875, 3.275390625], [122.3837890625, 3.025390625], [129.3935546875, 1.4638671875], [143.8857421875, 1.4638671875], [147.376953125, 0.4931640625]]
clarkEvans (x1, y1, roi1)
x2 = [94, 111, 79, 95, 86, 46, 30, 34, 53, 17, 44, 20, 42, 56, 23, 21, 50, 16, 50, 52, 47, 132, 44, 40, 43, 33, 29, 52, 24, 125, 86, 84]
y2 = [17, 71, 94, 88, 108, 132, 116, 115, 121, 132, 120, 121, 123, 116, 116, 139, 121, 124, 116, 140, 141, 33, 119, 118, 125, 130, 123, 122, 40, 23, 80, 107]
roi2 = [[129.4560546875, 3.6552734375], [132.3408203125, 5.84765625], [134.4638671875, 12.7744140625], [134.4638671875, 45.3828125], [132.65234375, 56.0302734375], [132.65234375, 66.8994140625], [131.4169921875, 70.3056640625], [130.7021484375, 77.5029296875], [129.029296875, 84.5419921875], [129.029296875, 88.1650390625], [127.2177734375, 95.1728515625], [127.16796875, 106.04296875], [125.40625, 113.0712890625], [125.40625, 116.6943359375], [123.896484375, 119.9873046875], [120.6533203125, 121.6025390625], [110.0654296875, 123.9130859375], [99.6181640625, 126.8427734375], [89.896484375, 131.7041015625], [83.8388671875, 135.638671875], [77.03515625, 138.134765625], [73.4228515625, 138.40625], [56.181640625, 143.8408203125], [45.568359375, 142.029296875], [31.076171875, 142.029296875], [27.4736328125, 141.6455078125], [20.626953125, 139.302734375], [15.5029296875, 134.1787109375], [11.0546875, 128.5234375], [4.537109375, 115.5849609375], [0.40625, 98.0078125], [0.513671875, 72.646484375], [4.927734375, 55.0859375], [11.4091796875, 42.123046875], [15.333984375, 36.0859375], [21.94921875, 27.4912109375], [34.7587890625, 14.6806640625], [43.416015625, 8.205078125], [57.9013671875, 7.970703125], [61.28125, 6.666015625], [71.9052734375, 4.541015625], [82.7705078125, 4.34765625], [89.7578125, 2.5361328125], [107.873046875, 2.5361328125], [114.8916015625, 1.015625], [122.126953125, 0.724609375]]
clarkEvans (x2, y2, roi2)
Using the original R function yields similar but not equal results:
clarkevans.test(X, alternative = "clustered")
>R= 0.87719, p-value = 9.542e-07 # First dataset
>R= 0.83365, p-value = 0.03591 # Second dataset
I'm not sure if the statistic and p-value calculation are valid since my study areas are irregularly shaped. The variable SE is calculated with pi, which seems like it is estimating a random distribution in a circular study area. Should I do Monte Carlo simulations instead? Is there a way of avoiding that?
I have not worked with the Clark-Evans (CE) index before, but having read the information you linked to and studied your code, my interpretation is this:
More speculatively, I wonder if it would be useful to use a convex hull to help determine the region of interest rather than do this subjectively.