So I am using the R package doParallel to parallelized some steps of my script when I have to handle large list of elements to compute it faster. Since this time all of the functions I used so far were wroking perfectly well with foreach() : I just had to specify my number of cores with registerDoParallel() and that was all!
I recently tried to use different statistics tests in R using var.test() and t.test() and I don't understand why but I realized that used in foreach() it wasn't working... So to be more clear what I am basically doing is iterating over rows of 2 matrices of the same dimensions : each row , in each matrix, contains 5 numeric values and I do for example:
var.test(matrixA[1,],matrixB[1,])$p.value
to extract, for row number 1, the corresponding p.value from the Fisher test made on 10 numeric values (2 groups of 5 values in each matrix's row number 1). Problem is my matrices have millions of rows so I have to iterate over the number of rows, and I do this with the foreach() function:
p.values.res<-foreach(i=seq(dim(matrixA)[1])) %dopar%
var.test(matrixA[i,],matrixB[i,])$p.value
(Here I set registerDoParallel(cores = 6) prior to the foreach()). I tried different tests : fisher test and student test (t.test()) and unfortunately none of them were working on my 6 cores, only one.
I also tried with "cl": registerDoParallel(cl = 4) It doesn't work either.
I tried to restart R, to quit and reopen session, to restart computer: doesn't work.
Does anybody knows why it does not work, and how to fix that ?
My configuration: Linux Mint 18.2 Cinnamon 64-bit (3.4.6); Intel Core I7-6700 CPU; R version 3.4.3 (2017-11-30); RStudio Version 1.1.383 2009-2017.
here are 2 short examples of matrices
MatrixA:
0.7111111 0.7719298 0.7027027 0.6875000 0.6857143
0.8292683 0.6904762 0.8222222 0.8333333 0.6250000
0.8846154 0.5714286 0.8928571 0.8846154 0.9259259
0.9000000 0.5000000 0.9500000 0.8666667 0.8260870
0.8235294 0.3684211 0.9411765 0.8333333 0.8000000
0.5714286 0.2142857 0.6666667 0.5000000 0.5555556
MatrixB:
0.5227273 0.7142857 0.7808219 0.6346154 0.7362637
0.9166667 0.7173913 0.8611111 0.7391304 0.7538462
0.8666667 0.6052632 0.8260870 0.7333333 0.9024390
0.9285714 0.5806452 0.8750000 0.6956522 0.8787879
0.8333333 0.5517241 0.8333333 0.6818182 0.8750000
0.7500000 0.2941176 0.6666667 0.4444444 0.7500000
Thank you all in advance for your help. Regards,
Unfortunately I didn't find any solution to my problem with doParallel but I realized that I did not have to use it in the first place.
From the R package "genefilter" I find an alternative solution using the function rowttests() that is really fast for doing t-tests on large matrix. The only comment I have against the function is that it assumes that variances are equal when calculating p-values (and you can't change that). Fortunately I am in this case.
So I just had to cbind() my 2 matrix, specify belonging groups as factors for columns. And that's all !
bind_matrix<-cbind(matrixA,matrixB)
fact<-factor(c("A","A","A","A","A","B","B","B","B","B"))
p.vals<-rowttests(bind_matrix,fact)$p.values
It takes few seconds and I tried it for a 10 millions rows matrix.
The solution is the same Fisher test, there is a function rowFtests().
So now I might ask for a speed efficient solution for Wilcoxon tests. If someone knows a function that works similarily to theses ones, comment it please.