I have a column of data from which I am taking randomized sub samples of 50%.
I'm running a two sided ks test to compare the distribution of 50%
of the data against 100% of the data to see if the distribution is still a significant fit.
In order to meet my objectives I want to run this as a loop of say 1000 to get an average p-value from 1000 randomized sub samples. This line of code gives me a single p-value for a random subset of 50% of my sample:
dat50=dat[sample(nrow(dat),replace=F,size=0.50*nrow(dat)),]
ks.test(dat[,1],dat50[,1], alternative="two.sided")
I need a line of code that will run this 1000 times saving the resulting (different) p value each time in a column which I can then average. The code I'm trying to get to work looks like this:
x <- numeric(100)
for (i in 1:100){
x<- ks.test(dat[,7],dat50[,7], alternative="two.sided")
x<-x$p.value
}
However this does not store multiple p-values
Also tried this:
get.p.value <- function(df1, df2) {
x <- rf(5, df1=df1, df2=df2)
p.value <- ks.test(dat[,6],dat50[,6], alternative="two.sided")$p.value
}
replicate (2000, get.p.value(df1 = 5, df2 = 10))
I hope that is clear and I would appreciate any help solving this so much!
Q
In your for loop you are overwriting x
in each iteration meaning that you will only save the p-value for the last iteration. Try this instead:
x <- numeric(100)
for (i in 1:length(x))
x[i] <- ks.test(dat[,17], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value
You can get the same result using replicate
with:
replicate(100, ks.test(dat[,7], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value)