Search code examples
rspatstatecdfkolmogorov-smirnov

Set weights for ewcdf {spatstat} [R]


I want to compare a reference distribution d_1 with a sample d_2 drawn proportionally to size w_1 using the Kolmogorov–Smirnov distance.

Given that d_2 is weighted, I was considering accounting for this using the Weighted Empirical Cumulative Distribution Function in R (using ewcdf {spatstat}).

The example below shows that I am probably miss-specifying the weights, because when lenght(d_1) == lenght(d_2) the Kolmogorov–Smirnov is not giving a value of 0.

Can someone help me with this? For clarity, see the reproducible example below.

#loop for testing sample sizes 1:length(d_1)
d_stat <- data.frame(1:1000, rep(NA, 1000))
names(d_stat) <- c("sample_size", "ks_distance")

for (i in 1:1000) {

#reference distribution
d_1 <- rpois(1000, 500)
w_1 <- d_1/sum(d_1)
m_1 <- data.frame(d_1, w_1)

#sample from the reference distribution
m_2 <-m_1[(sample(nrow(m_1), size=i, prob=w_1, replace=F)),]
d_2 <- m_2$d_1
w_2 <- m_2$w_1

#ewcdf for the reference distribution and the sample
f_d_1 <- ewcdf(d_1)
f_d_2 <- ewcdf(d_2, 1/w_2, normalise=F, adjust=1/length(d_2))

#kolmogorov-smirnov distance
d_stat[i,2] <- max(abs(f_d_1(d_2) - f_d_2(d_2)))
}

d_stat[1000,2]

Solution

  • Your code generates some data d1 and associated numeric weights w1. These data are then treated as a reference population. The code takes a random sample d2 from this population of values d1, with sampling probabilities proportional to the associated weights w1. From the sample, you compute the weighted empirical distribution function f_d_2 of the sampled values d2, with weights inversely proportional to the original sampling probabilities. This function f_d_2 is a correct estimate of the original population distribution function, by the Horvitz-Thompson principle. But it's not exactly equal to the original population distribution, because it's a sample. The Kolmogorov-Smirnov test statistic should not be zero; it should be a small value.