Search code examples
rp-valuekolmogorov-smirnov

Calculating a statistic for multiple studies in R


I have a data set in which I want to apply several tests such as KS two sample. So I'm trying to find an algorithm which can apply the KS two sample test to all samples. The basic idea is:

Lets say I have a data set with these observations:

       1 2 3 4
Study1 9 1 2 6
Study2 5 6 7 8
Study3 4 3 2 1
Study4 8 7 6 5
Study5 1 3 5 7
Study6 2 4 6 8
Study7 1 3 6 9
Study8 2 4 7 1
Study9 2 5 8 4
Study10 3 6 8 5

I could apply the KS test to each and every study with the following:

ks.test(as.numeric(as.vector(df[1,])),as.numeric(as.vector(df[1,])))
ks.test(as.numeric(as.vector(df[1,])),as.numeric(as.vector(df[2,])))
ks.test(as.numeric(as.vector(df[1,])),as.numeric(as.vector(df[3,])))
                                   ...
ks.test(as.numeric(as.vector(df[1,])),as.numeric(as.vector(df[10,])))
ks.test(as.numeric(as.vector(df[2,])),as.numeric(as.vector(df[1,])))
                                   ...
ks.test(as.numeric(as.vector(df[10,])),as.numeric(as.vector(df[10,])))

This would result in 10x10 p-values and my aim is to use this as a measure of distance.

So I'm looking for an algorithm which can run a KS test for n x n samples and then output the p-values in a n x n matrix.


Solution

  • You are looking for outer:

    outer(1:10, 1:10, Vectorize(function(i,j) {ks.test(as.numeric(as.vector(df[i,])),as.numeric(as.vector(df[j,])))$p.value}))