Search code examples
rp-valuekolmogorov-smirnov

two sided ks test loop, get p.value


I have a column of data from which I am taking randomized sub samples of 50%. I'm running a two sided ks test to compare the distribution of 50% of the data against 100% of the data to see if the distribution is still a significant fit.

In order to meet my objectives I want to run this as a loop of say 1000 to get an average p-value from 1000 randomized sub samples. This line of code gives me a single p-value for a random subset of 50% of my sample:

dat50=dat[sample(nrow(dat),replace=F,size=0.50*nrow(dat)),]
ks.test(dat[,1],dat50[,1], alternative="two.sided")

I need a line of code that will run this 1000 times saving the resulting (different) p value each time in a column which I can then average. The code I'm trying to get to work looks like this:

x <- numeric(100)
for (i in 1:100){
  x<- ks.test(dat[,7],dat50[,7], alternative="two.sided")
  x<-x$p.value
}

However this does not store multiple p-values

Also tried this:

get.p.value <- function(df1, df2) {
  x <- rf(5, df1=df1, df2=df2)
  p.value <- ks.test(dat[,6],dat50[,6], alternative="two.sided")$p.value
}
replicate (2000, get.p.value(df1 = 5, df2 = 10))

I hope that is clear and I would appreciate any help solving this so much!

Q


Solution

  • In your for loop you are overwriting x in each iteration meaning that you will only save the p-value for the last iteration. Try this instead:

    x <- numeric(100)
    for (i in 1:length(x))
        x[i] <- ks.test(dat[,17], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value
    

    You can get the same result using replicate with:

     replicate(100, ks.test(dat[,7], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value)