Search code examples
rdataframematrixhypothesis-testpairwise.wilcox.test

running paired wilcoxon test on rows of two dataframes


I have a two large dataframes (around 19000 rows and 71 columns) as follows df1

sample1 sample2 sample3
gene1 5 10 15
gene2 2 8 10
gene3 3 9 10

df2

sample1 sample2 sample3
gene1 40 50 65
gene2 12 18 0
gene3 31 19 10

I am trying to perform wilcoxon rank sum test on the rows with the same index but the code is taking forever on google colab!! My code so far

wilc_results= c()
for( x in 1:nrow(df1)){
  for (y in 1:nrow(df2)){
    result= wilcox.test(as.numeric(df2[y,]), as.numeric(f1d[x,]), 
                        alternative= 'two.sided', paired= T )
    wilc_results[length(wilc_results) + 1] <- result$p.value
  }
}

is there a much faster way to get the desired output?


Solution

  • There is no need to loop twice, since both your data frames have the same number of columns. It runs in about 10 seconds on a similarly sized dataset on my computer.

    wilc_results <- list()
    for(i in 1:nrow(df1)) {
      result <- wilcox.test(as.numeric(df1[i,]), as.numeric(df2[i,]),
                            alternative='two.sided', paired=T)
      wilc_results[[i]] <- result$p.value
    }