Search code examples
rstatisticsmapplypairwise.wilcox.test

How would I compare the values in each set of 4 rows to one value (Wilcox test specifying mu)?


I need to compare the value in every four rows to one value (as mu) using a Wilcoxon signed rank sum test. For example if my data looks like this:

df1 <- c(0.205346764819837, 0.260927758796802, 0.243880102849495, 0.244549329012715, 
      0.122609277587968, 0.19381141911169, 0.0617801415941672, 0.217762671269064, 
      0.0513190799901377, 0.293455672572294, 0.222447254411609, 0.271001373674756, 
      0.00119756260786869, 0.119069423408827, -0.0164312634285513, 
      0.0446268183579303)

     
df2 <- c(0.23340509, 0.05959987, 0.17380963, 0.14517836)

I am using a wilcox.test to compare each of the four values from df1 with one value as mu from df_stack2. Considering a df with just the first four rows it would be

wilcox.test(dfnew$A, mu=0.23340509)$p.value. 

I realise I could group every four rows through using:

split(df, as.integer(gl(nrow(df) 4, nrow(df))))

I was hoping to adopt this for use in a mapply (so I could parallelise with future.apply due to the actual size of my dataframe), however, I am a little unsure as to how I could specify every four rows being compared to one value (in a separate dataframe) as mu?


Solution

  • You can create your group using rep() and apply your function by group:

    library(data.table)
    setDT(dfnew)[, grp:=rep(1:(.N/4), each=4, length.out=.N)]
    dfnew[, .(pval = wilcox.test(A, mu=df2[.BY$grp])$p.value), grp]
    

    Output:

         grp  pval
       <int> <num>
    1:     1 0.875
    2:     2 0.125
    3:     3 0.875
    4:     4 0.125
    

    Similarly, using dplyr:

    dfnew %>% 
      group_by(grp = rep(1:(n()/4), each=4, length.out=n())) %>% 
      summarize(pval = wilcox.test(A,mu = df2[cur_group()$grp])$p.value)
    

    Output:

        grp  pval
      <int> <dbl>
    1     1 0.875
    2     2 0.125
    3     3 0.875
    4     4 0.125
    

    There is another approach that you might find interesting:

    setDT(dfnew)[, .(pval = wilcox.test(A, mu=.BY$mu)$p.value), .(mu = rep(df2, each=4))]
    

    Output:

               mu  pval
            <num> <num>
    1: 0.23340509 0.875
    2: 0.05959987 0.125
    3: 0.17380963 0.875
    4: 0.14517836 0.125