Search code examples
rdataframeoptimizationstatisticsmemory-efficient

How to run Wilcox test on all combination of groups across large number of features in R?


I have a large sparse matrix (lets call it matrix) in which the rows are the features and the columns are the samples. Each column/sample belong to 1 of 6 groups. I randomly sample some amount from each group and store what index they belong to in the original matrix.

astro_index <- Map(sample,row_index, num_sample)[1] 
endo_index <- Map(sample,row_index, num_sample)[2] 
micro_index <- Map(sample,row_index, num_sample)[3]
neuron_index <- Map(sample,row_index, num_sample)[4] 
oligo_index <- Map(sample,row_index, num_sample)[5] 
opc_index <- Map(sample,row_index, num_sample)[6] 

The goal is to be able perform a Wilcox test and get the p-value on all the combination of the 6 groups for each features. The big issue is that I have over 30,000 features to test on all combination of 6 groups (so that is 15 comparisons for each of the 30,000+ features).

So I have two current methods. The first one uses the apply function and does it for only one comparison (here astro and neuron group). The disadvantage for this method is that I run into memory issues and it only does 1 comparison at a time. I would have to write this 14 more times to get all possible comparisons.

store_p <- apply(matrix,1,function(x) {wilcox.test(x[astro_index],x[neuron_index])$p.value }) 

The second method uses a for loop to go through all the features but I take advantage of the combn and the data frame to calculate the p-value for all combinations but one feature at a time. This method is really slow but does not crash.

for (i in features){

  df <- data.frame('Astro' = matrix[i,astro_index], 'Endo' = matrix[i,endo_index], 'Micro' = matrix[i,micro_index], 'Neuron' = matrix[i,neuron_index], 'Oligo' = matrix[i,oligo_index], 'OPC' = matrix[i,opc_index])
  result <- combn(names(df), 2, FUN = function(x) paste(paste(x, collapse='-'), wilcox.test(df[,x[1]], df[,x[2]], paired = TRUE)$p.value, sep=" : ")) 
  hold_list <- append(hold_list, list(result))

}

To give a sense of what the result looks like. Here is a sample output of result

> result
 [1] "Astro-Endo : 0.115331575924872"      "Astro-Micro : 0.935664046257304"     "Astro-Neuron : 0.0271849565394441"  
 [4] "Astro-Oligo : 0.00147694402781699"   "Astro-OPC : 0.0476580762532988"      "Endo-Micro : 0.297672151508384"     
 [7] "Endo-Neuron : 2.38134038927696e-06"  "Endo-Oligo : 0.0323129112432441"     "Endo-OPC : 0.451258974150342"       
[10] "Micro-Neuron : 0.000143621746738224" "Micro-Oligo : 0.0178171887595787"    "Micro-OPC : 0.0692129715131915"     
[13] "Neuron-Oligo : 6.68255453156116e-10" "Neuron-OPC : 6.201108273594e-07"     "Oligo-OPC : 0.142213241936393"  

I would ideally like the best of both world of both methods and do a more efficient process to compute these tests. Also if there is a suggestion to designing a different data frame all together to tackle this task in one way I would appreciate that too.

EDIT I realized I did not make at as clear but the result is only for one feature of all combinations. I have a for loop so that it goes through all the features. In essence, there should be a p-value calculated for all the feature and for all the combination.


Solution

  • I would use pairwiseWilcox from scran for that - that seems ideally suited for your problem. It performs pairwise Wilcoxon rank sum tests for each row between groups of columns, where groups is a vector of column assignments.

    Edit:

    • Sampled groups to have equal number of elements (columns), since the OP seems to want that.
    • Made the matrix less sparse to make it clearer that it does not compare individual values, but groups of values for each row.

    Example:

    library(Matrix)
    types <- c("Astro", "Neuron", "Endo", "Oligo", "OPC", "Micro")
    
    # generate sparse matrix
    set.seed(123)
    mat <- Matrix(0, nrow = 10000, ncol = 1000, sparse = TRUE)
    mat[sample(seq_along(mat), 1E5)] <- runif(n = 1e5, min = 0, max=100)
    groups <- c(rep(types, each = floor(ncol(mat)/6)), rep("Micro", ncol(mat) %% 6))
    colnames(mat) <- make.unique(groups)
    
    # sample n=100 samples of each group
    idx <- setNames(lapply(types, function(x) grep(x, colnames(mat))), types)
    smp <- Map(sample, idx, size = 100)
    groups <- gsub("[0-9]+", "", names(unlist(smp)))
    
    # subset mat to sampled columns
    mat <- mat[, unlist(smp, use.names = FALSE)]
    
    library(scran)
    
    pwt <- pairwiseWilcox(mat, groups = groups)
    pwt
    #> $statistics
    #> $statistics[[1]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49995  1.000000  1.000000
    #> 2       0.51000  0.158341  0.616668
    #> 3       0.49000  0.158341  0.616668
    #> 4       0.50490  0.573540  0.856541
    #> 5       0.48985  0.308851  0.616668
    #> ...         ...       ...       ...
    #> 9996     0.4950  0.565662  0.856541
    #> 9997     0.5050  0.322174  0.616668
    #> 9998     0.4951  0.573540  0.856541
    #> 9999     0.4950  0.322174  0.616668
    #> 10000    0.5050  0.322174  0.616668
    #> 
    #> $statistics[[2]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5050 0.3221741  0.613045
    #> 2        0.5049 0.5735395  0.858464
    #> 3        0.4800 0.0444225  0.613045
    #> 4        0.4947 0.6352736  0.948311
    #> 5        0.4949 0.5578376  0.858464
    #> ...         ...       ...       ...
    #> 9996    0.49500  0.565662  0.858464
    #> 9997    0.50005  1.000000  1.000000
    #> 9998    0.50500  0.322174  0.613045
    #> 9999    0.50000  1.000000  1.000000
    #> 10000   0.50500  0.322174  0.613045
    #> 
    #> $statistics[[3]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5050  0.322174  0.605697
    #> 2        0.5001  0.995980  1.000000
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.5100  0.158341  0.605697
    #> 5        0.4949  0.557838  0.854499
    #> ...         ...       ...       ...
    #> 9996    0.50005  1.000000  1.000000
    #> 9997    0.49995  1.000000  1.000000
    #> 9998    0.50005  1.000000  1.000000
    #> 9999    0.49500  0.322174  0.605697
    #> 10000   0.49995  1.000000  1.000000
    #> 
    #> $statistics[[4]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49995  1.000000  1.000000
    #> 2       0.50010  0.995980  1.000000
    #> 3       0.50000  1.000000  1.000000
    #> 4       0.49490  0.648212  0.959177
    #> 5       0.50500  0.322174  0.615026
    #> ...         ...       ...       ...
    #> 9996     0.4949  0.557838  0.859750
    #> 9997     0.4951  0.573540  0.859750
    #> 9998     0.4852  0.182661  0.615026
    #> 9999     0.5000  1.000000  1.000000
    #> 10000    0.4949  0.557838  0.859750
    #> 
    #> $statistics[[5]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50500  0.322174  0.620334
    #> 2       0.49480  0.641729  0.964426
    #> 3       0.50000  1.000000  1.000000
    #> 4       0.51000  0.158341  0.620334
    #> 5       0.49995  1.000000  1.000000
    #> ...         ...       ...       ...
    #> 9996    0.50005  1.000000  1.000000
    #> 9997    0.49015  0.323442  0.620334
    #> 9998    0.50005  1.000000  1.000000
    #> 9999    0.49500  0.322174  0.620334
    #> 10000   0.50500  0.322174  0.620334
    #> 
    #> $statistics[[6]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50005  1.000000  1.000000
    #> 2       0.49000  0.158341  0.616668
    #> 3       0.51000  0.158341  0.616668
    #> 4       0.49510  0.573540  0.856541
    #> 5       0.51015  0.308851  0.616668
    #> ...         ...       ...       ...
    #> 9996     0.5050  0.565662  0.856541
    #> 9997     0.4950  0.322174  0.616668
    #> 9998     0.5049  0.573540  0.856541
    #> 9999     0.5050  0.322174  0.616668
    #> 10000    0.4950  0.322174  0.616668
    #> 
    #> $statistics[[7]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50500  0.322174  0.616668
    #> 2       0.49500  0.322174  0.616668
    #> 3       0.48960  0.392127  0.746909
    #> 4       0.49005  0.318530  0.616668
    #> 5       0.50500  0.654721  0.960283
    #> ...         ...       ...       ...
    #> 9996     0.5001  0.995980  1.000000
    #> 9997     0.4950  0.322174  0.616668
    #> 9998     0.5100  0.158341  0.616668
    #> 9999     0.5050  0.322174  0.616668
    #> 10000    0.5000  1.000000  1.000000
    #> 
    #> $statistics[[8]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1         0.505  0.322174  0.604226
    #> 2         0.490  0.158341  0.604226
    #> 3         0.510  0.158341  0.604226
    #> 4         0.505  0.322174  0.604226
    #> 5         0.505  0.654721  0.952044
    #> ...         ...       ...       ...
    #> 9996    0.50510  0.557838  0.849437
    #> 9997    0.49500  0.322174  0.604226
    #> 9998    0.50500  0.565662  0.849437
    #> 9999    0.49995  1.000000  1.000000
    #> 10000   0.49500  0.322174  0.604226
    #> 
    #> $statistics[[9]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49995  1.000000  1.000000
    #> 2       0.49000  0.158341  0.611076
    #> 3       0.51000  0.158341  0.611076
    #> 4       0.49005  0.318530  0.611076
    #> 5       0.51500  0.082748  0.611076
    #> ...         ...       ...       ...
    #> 9996     0.5000  1.000000  1.000000
    #> 9997     0.4900  0.158341  0.611076
    #> 9998     0.4899  0.405995  0.762863
    #> 9999     0.5050  0.322174  0.611076
    #> 10000    0.4900  0.158341  0.611076
    #> 
    #> $statistics[[10]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50500  0.322174  0.619147
    #> 2       0.48500  0.082748  0.619147
    #> 3       0.51000  0.158341  0.619147
    #> 4       0.50500  0.322174  0.619147
    #> 5       0.50985  0.323442  0.619147
    #> ...         ...       ...       ...
    #> 9996    0.50500  0.565662  0.863244
    #> 9997    0.48500  0.082748  0.619147
    #> 9998    0.50510  0.557838  0.863244
    #> 9999    0.50005  1.000000  1.000000
    #> 10000   0.50000  1.000000  1.000000
    #> 
    #> $statistics[[11]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.4950 0.3221741  0.613045
    #> 2        0.4951 0.5735395  0.858464
    #> 3        0.5200 0.0444225  0.613045
    #> 4        0.5053 0.6352736  0.948311
    #> 5        0.5051 0.5578376  0.858464
    #> ...         ...       ...       ...
    #> 9996    0.50500  0.565662  0.858464
    #> 9997    0.49995  1.000000  1.000000
    #> 9998    0.49500  0.322174  0.613045
    #> 9999    0.50000  1.000000  1.000000
    #> 10000   0.49500  0.322174  0.613045
    #> 
    #> $statistics[[12]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49500  0.322174  0.616668
    #> 2       0.50500  0.322174  0.616668
    #> 3       0.51040  0.392127  0.746909
    #> 4       0.50995  0.318530  0.616668
    #> 5       0.49500  0.654721  0.960283
    #> ...         ...       ...       ...
    #> 9996     0.4999  0.995980  1.000000
    #> 9997     0.5050  0.322174  0.616668
    #> 9998     0.4900  0.158341  0.616668
    #> 9999     0.4950  0.322174  0.616668
    #> 10000    0.5000  1.000000  1.000000
    #> 
    #> $statistics[[13]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5000 1.0000000  1.000000
    #> 2        0.4951 0.5735395  0.868341
    #> 3        0.5200 0.0444225  0.625009
    #> 4        0.5150 0.0827480  0.625009
    #> 5        0.5000 1.0000000  1.000000
    #> ...         ...       ...       ...
    #> 9996    0.50500  0.565662  0.868341
    #> 9997    0.49995  1.000000  1.000000
    #> 9998    0.49500  0.322174  0.625009
    #> 9999    0.49500  0.322174  0.625009
    #> 10000   0.49500  0.322174  0.625009
    #> 
    #> $statistics[[14]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49500 0.3221741  0.606038
    #> 2       0.49510 0.5735395  0.852213
    #> 3       0.52000 0.0444225  0.606038
    #> 4       0.50005 1.0000000  1.000000
    #> 5       0.51000 0.1583409  0.606038
    #> ...         ...       ...       ...
    #> 9996     0.4998 0.9879417  1.000000
    #> 9997     0.4951 0.5735395  0.852213
    #> 9998     0.4800 0.0444225  0.606038
    #> 9999     0.5000 1.0000000  1.000000
    #> 10000    0.4900 0.1583409  0.606038
    #> 
    #> $statistics[[15]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50000 1.0000000  1.000000
    #> 2       0.49005 0.3185296  0.619978
    #> 3       0.52000 0.0444225  0.619978
    #> 4       0.51500 0.0827480  0.619978
    #> 5       0.50500 0.5656624  0.863114
    #> ...         ...       ...       ...
    #> 9996    0.50500  0.565662  0.863114
    #> 9997    0.49015  0.323442  0.619978
    #> 9998    0.49500  0.322174  0.619978
    #> 9999    0.49500  0.322174  0.619978
    #> 10000   0.50000  1.000000  1.000000
    #> 
    #> $statistics[[16]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.4950  0.322174  0.605697
    #> 2        0.4999  0.995980  1.000000
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.4900  0.158341  0.605697
    #> 5        0.5051  0.557838  0.854499
    #> ...         ...       ...       ...
    #> 9996    0.49995  1.000000  1.000000
    #> 9997    0.50005  1.000000  1.000000
    #> 9998    0.49995  1.000000  1.000000
    #> 9999    0.50500  0.322174  0.605697
    #> 10000   0.50005  1.000000  1.000000
    #> 
    #> $statistics[[17]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1         0.495  0.322174  0.604226
    #> 2         0.510  0.158341  0.604226
    #> 3         0.490  0.158341  0.604226
    #> 4         0.495  0.322174  0.604226
    #> 5         0.495  0.654721  0.952044
    #> ...         ...       ...       ...
    #> 9996    0.49490  0.557838  0.849437
    #> 9997    0.50500  0.322174  0.604226
    #> 9998    0.49500  0.565662  0.849437
    #> 9999    0.50005  1.000000  1.000000
    #> 10000   0.50500  0.322174  0.604226
    #> 
    #> $statistics[[18]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5000 1.0000000  1.000000
    #> 2        0.5049 0.5735395  0.868341
    #> 3        0.4800 0.0444225  0.625009
    #> 4        0.4850 0.0827480  0.625009
    #> 5        0.5000 1.0000000  1.000000
    #> ...         ...       ...       ...
    #> 9996    0.49500  0.565662  0.868341
    #> 9997    0.50005  1.000000  1.000000
    #> 9998    0.50500  0.322174  0.625009
    #> 9999    0.50500  0.322174  0.625009
    #> 10000   0.50500  0.322174  0.625009
    #> 
    #> $statistics[[19]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.4950  0.322174  0.619978
    #> 2        0.4999  0.995980  1.000000
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.4850  0.082748  0.619978
    #> 5        0.5100  0.158341  0.619978
    #> ...         ...       ...       ...
    #> 9996     0.4949  0.557838  0.863504
    #> 9997     0.4951  0.573540  0.863504
    #> 9998     0.4850  0.176800  0.619978
    #> 9999     0.5050  0.322174  0.619978
    #> 10000    0.4949  0.557838  0.863504
    #> 
    #> $statistics[[20]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5000  1.000000  1.000000
    #> 2        0.4947  0.635274  0.958759
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.5000  1.000000  1.000000
    #> 5        0.5050  0.565662  0.869131
    #> ...         ...       ...       ...
    #> 9996    0.49995  1.000000  1.000000
    #> 9997    0.49015  0.323442  0.625372
    #> 9998    0.50005  1.000000  1.000000
    #> 9999    0.50005  1.000000  1.000000
    #> 10000   0.50500  0.322174  0.625372
    #> 
    #> $statistics[[21]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50005  1.000000  1.000000
    #> 2       0.49990  0.995980  1.000000
    #> 3       0.50000  1.000000  1.000000
    #> 4       0.50510  0.648212  0.959177
    #> 5       0.49500  0.322174  0.615026
    #> ...         ...       ...       ...
    #> 9996     0.5051  0.557838  0.859750
    #> 9997     0.5049  0.573540  0.859750
    #> 9998     0.5148  0.182661  0.615026
    #> 9999     0.5000  1.000000  1.000000
    #> 10000    0.5051  0.557838  0.859750
    #> 
    #> $statistics[[22]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50005  1.000000  1.000000
    #> 2       0.51000  0.158341  0.611076
    #> 3       0.49000  0.158341  0.611076
    #> 4       0.50995  0.318530  0.611076
    #> 5       0.48500  0.082748  0.611076
    #> ...         ...       ...       ...
    #> 9996     0.5000  1.000000  1.000000
    #> 9997     0.5100  0.158341  0.611076
    #> 9998     0.5101  0.405995  0.762863
    #> 9999     0.4950  0.322174  0.611076
    #> 10000    0.5100  0.158341  0.611076
    #> 
    #> $statistics[[23]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50500 0.3221741  0.606038
    #> 2       0.50490 0.5735395  0.852213
    #> 3       0.48000 0.0444225  0.606038
    #> 4       0.49995 1.0000000  1.000000
    #> 5       0.49000 0.1583409  0.606038
    #> ...         ...       ...       ...
    #> 9996     0.5002 0.9879417  1.000000
    #> 9997     0.5049 0.5735395  0.852213
    #> 9998     0.5200 0.0444225  0.606038
    #> 9999     0.5000 1.0000000  1.000000
    #> 10000    0.5100 0.1583409  0.606038
    #> 
    #> $statistics[[24]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5050  0.322174  0.619978
    #> 2        0.5001  0.995980  1.000000
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.5150  0.082748  0.619978
    #> 5        0.4900  0.158341  0.619978
    #> ...         ...       ...       ...
    #> 9996     0.5051  0.557838  0.863504
    #> 9997     0.5049  0.573540  0.863504
    #> 9998     0.5150  0.176800  0.619978
    #> 9999     0.4950  0.322174  0.619978
    #> 10000    0.5051  0.557838  0.863504
    #> 
    #> $statistics[[25]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5050  0.322174  0.618555
    #> 2        0.4947  0.635274  0.954724
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.5150  0.082748  0.618555
    #> 5        0.4950  0.322174  0.618555
    #> ...         ...       ...       ...
    #> 9996     0.5051  0.557838  0.864937
    #> 9997     0.4948  0.641729  0.961248
    #> 9998     0.5152  0.171079  0.618555
    #> 9999     0.4950  0.322174  0.618555
    #> 10000    0.5100  0.158341  0.618555
    #> 
    #> $statistics[[26]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49500  0.322174  0.620334
    #> 2       0.50520  0.641729  0.964426
    #> 3       0.50000  1.000000  1.000000
    #> 4       0.49000  0.158341  0.620334
    #> 5       0.50005  1.000000  1.000000
    #> ...         ...       ...       ...
    #> 9996    0.49995  1.000000  1.000000
    #> 9997    0.50985  0.323442  0.620334
    #> 9998    0.49995  1.000000  1.000000
    #> 9999    0.50500  0.322174  0.620334
    #> 10000   0.49500  0.322174  0.620334
    #> 
    #> $statistics[[27]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.49500  0.322174  0.619147
    #> 2       0.51500  0.082748  0.619147
    #> 3       0.49000  0.158341  0.619147
    #> 4       0.49500  0.322174  0.619147
    #> 5       0.49015  0.323442  0.619147
    #> ...         ...       ...       ...
    #> 9996    0.49500  0.565662  0.863244
    #> 9997    0.51500  0.082748  0.619147
    #> 9998    0.49490  0.557838  0.863244
    #> 9999    0.49995  1.000000  1.000000
    #> 10000   0.50000  1.000000  1.000000
    #> 
    #> $statistics[[28]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1       0.50000 1.0000000  1.000000
    #> 2       0.50995 0.3185296  0.619978
    #> 3       0.48000 0.0444225  0.619978
    #> 4       0.48500 0.0827480  0.619978
    #> 5       0.49500 0.5656624  0.863114
    #> ...         ...       ...       ...
    #> 9996    0.49500  0.565662  0.863114
    #> 9997    0.50985  0.323442  0.619978
    #> 9998    0.50500  0.322174  0.619978
    #> 9999    0.50500  0.322174  0.619978
    #> 10000   0.50000  1.000000  1.000000
    #> 
    #> $statistics[[29]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.5000  1.000000  1.000000
    #> 2        0.5053  0.635274  0.958759
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.5000  1.000000  1.000000
    #> 5        0.4950  0.565662  0.869131
    #> ...         ...       ...       ...
    #> 9996    0.50005  1.000000  1.000000
    #> 9997    0.50985  0.323442  0.625372
    #> 9998    0.49995  1.000000  1.000000
    #> 9999    0.49995  1.000000  1.000000
    #> 10000   0.49500  0.322174  0.625372
    #> 
    #> $statistics[[30]]
    #> DataFrame with 10000 rows and 3 columns
    #>             AUC   p.value       FDR
    #>       <numeric> <numeric> <numeric>
    #> 1        0.4950  0.322174  0.618555
    #> 2        0.5053  0.635274  0.954724
    #> 3        0.5000  1.000000  1.000000
    #> 4        0.4850  0.082748  0.618555
    #> 5        0.5050  0.322174  0.618555
    #> ...         ...       ...       ...
    #> 9996     0.4949  0.557838  0.864937
    #> 9997     0.5052  0.641729  0.961248
    #> 9998     0.4848  0.171079  0.618555
    #> 9999     0.5050  0.322174  0.618555
    #> 10000    0.4900  0.158341  0.618555
    #> 
    #> 
    #> $pairs
    #> DataFrame with 30 rows and 2 columns
    #>           first      second
    #>     <character> <character>
    #> 1         Astro        Endo
    #> 2         Astro       Micro
    #> 3         Astro      Neuron
    #> 4         Astro       Oligo
    #> 5         Astro         OPC
    #> ...         ...         ...
    #> 26          OPC       Astro
    #> 27          OPC        Endo
    #> 28          OPC       Micro
    #> 29          OPC      Neuron
    #> 30          OPC       Oligo
    

    Created on 2020-06-18 by the reprex package (v0.3.0)

    Edit #2:

    Just to see what my approach would give you in about 1/20,000th of time (on my machine, at least) of the approach by StupidWolf, try this with his example mat and group:

    set.seed(111)
    celltypes = c("astro","endo","micro","neuron","oligo","opc")
    mat = matrix(rnorm(10000*120),ncol=120)
    colnames(mat) = paste0("cell",1:120)
    rownames(mat) = paste0("gene",1:10000)
    metadata = data.frame(celltype=rep(celltypes,each=20))
    num_sample = 10
    use_cols = tapply(1:nrow(metadata),metadata$celltype,sample,num_sample)
    use_cols = unlist(use_cols)
    group = metadata$celltype[use_cols]
    
    library(scran)
    library(data.table)
    pwt <- pairwiseWilcox(mat[,use_cols], groups=group)
    unique_comps <- !duplicated(t(apply(pwt$pairs, 1, sort)))
    res <- rbindlist(setNames(lapply(pwt$statistics[unique_comps], 
                                     function(x) as.data.table(x, keep.rownames=TRUE)), 
                              apply(pwt$pairs[unique_comps,], 1, paste, collapse = '_')),
                     idcol = "comparison")[, .(comparison, rn, p.value)]
    
    setnames(res, "rn", "gene")
    res[gene=="gene999"]
    #>       comparison    gene   p.value
    #>  1:   astro_endo gene999 0.1858767
    #>  2:  astro_micro gene999 0.5707504
    #>  3: astro_neuron gene999 0.3846731
    #>  4:  astro_oligo gene999 0.4273553
    #>  5:    astro_opc gene999 0.3846731
    #>  6:   endo_micro gene999 0.4726756
    #>  7:  endo_neuron gene999 0.9097219
    #>  8:   endo_oligo gene999 0.6231762
    #>  9:     endo_opc gene999 0.6775850
    #> 10: micro_neuron gene999 0.9097219
    #> 11:  micro_oligo gene999 0.9097219
    #> 12:    micro_opc gene999 0.9097219
    #> 13: neuron_oligo gene999 0.8501067
    #> 14:   neuron_opc gene999 0.6775850
    #> 15:    oligo_opc gene999 0.9698500
    

    Created on 2020-06-19 by the reprex package (v0.3.0)