Search code examples
rdataframestatisticspairwise.wilcox.test

Apply wilcox.test to all paired columns in a dataframe


For my M.Sc. project I try to order all columns of a (by the user) given dataframe by their median, apply the wilcox.test on the columns by a specific schema (mentioned later) and then plot each column's values in a box-whisker-plot.

The ordering and the plotting works just fine, but I have trouble finding a way to apply the wilcox.test to the dataframe in the following schema:

wilcox.test(i, j, paired=TRUE)

whereas i=1 and j=2 and both incrementing until j=ncol(dataframe). So I want to run the function with the parameters column 1 and 2, after that with column 2 and 3 and so on, until j is the last column of the dataframe.

I too want to store all the p-values in a dataframe with one row (containing the p-values) and each row having the name of the two columns that were the parameters in their wilcox.test, because I dont only want to plot all the columns (each representing a "solution"), but I too want to print the p-values for each test in the console (something like: "Wilcoxon-test with 'Solution1' and 'Solution2' resulted in the p-value: 'p-value from wilcox.test of Solution1 and Solution2', which means the solutions are/aren't significatly different").

I tried to adjust some code in other posts concerning this problem, but nothing worked out. Unfortunately I am a very unexperienced in R, too, so I hope that what I wrote above was no utter bullsh*t either. I too tried to iterate over the columns of the dataframe with for-loops and increments in a java-manner, as this is the only programming language I got taught, but that didn't work at all (what a surprise).

The plot my code creates on base of a dataframe with very different values:

Thanks for any advices you guys can give me, it's very much appreciated!


Solution

  • Seems like a job for the matrixTests package. Here is a demonstration using the iris dataset:

    library(matrixTests)
    col_wilcoxon_twosample(iris[,1:3], iris[,2:4])
    
                 obs.x obs.y obs.tot statistic       pvalue alternative location.null exact corrected
    Sepal.Length   150   150     300   22497.5 9.812123e-51   two.sided             0 FALSE      TRUE
    Sepal.Width    150   150     300    7793.5 4.151103e-06   two.sided             0 FALSE      TRUE
    Petal.Length   150   150     300   19348.5 3.735718e-27   two.sided             0 FALSE      TRUE
    

    The returned results match wilcox.test() done on each pair. For example, 1st vs 2nd columns:

    w <- wilcox.test(iris[,1], iris[,2])
    
    w$p.value
    [1] 9.812123e-51