Search code examples
rdataframedplyrmultiple-columnspairwise.wilcox.test

Multiple wilcox.tests across columns using variables in first column (R)


I have this data.frame

 df <- data.frame(
      variable=c(2.4860651, -0.68863024, 2.63530974, -2.95754943, 1.67945091, 2.63530974,
           4.79002539, 2.32575938, 3.57236441, -0.364825998, -2.00646016, -3.12380516, 
           0.69307013, -5.65846824, 0.45632519, 2.08978142),
      A=c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0),
      B=c(1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0),
      C=c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1),
      D=c(1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0),
      E=c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
      F=c(0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1))

I would like to perform wilcox.test for each column with groups defined by 0 and 1 in the columns and using the variables in the column df$variable. Then add the p.values in a new row and adjusted p.values in another row.

I have tried this:

 library(dplyr)
 result <- df %>% summarise(across(!variable, ~wilcox.test(.x ~ variable)$p.value), exact=NULL) %>%
         bind_rows(., p.adjust(., method = 'BH')) %>%
         bind_rows(df, .) %>%
         mutate(variable=replace(variable, is.na(variable), c('p.values', 'p.adjust')))

But this causes errors.

This is the result I would like to get:

 result <- data.frame(
      variable=c(2.4860651, -0.68863024, 2.63530974, -2.95754943, 1.67945091, 2.63530974,
           4.79002539, 2.32575938, 3.57236441, -0.364825998, -2.00646016, -3.12380516, 
           0.69307013, -5.65846824, 0.45632519, 2.08978142, 'p.value', 'p.adjust'),
      A=c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1),
      B=c(1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0.560444274, 1),
      C=c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0.143117298, 0.764253489),
      D=c(1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0.820753088, 1),
      E=c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0.95482869, 1),
      F=c(0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0.254751163, 0.764253489))

Can anyone help?


Solution

  • You may try something along the lines of the following -

    library(dplyr)
    
    tmp <- df %>% summarise(across(!variable, 
              ~wilcox.test(variable[.x == 0], variable[.x == 1])$p.value)) 
    
    adj_value <- p.adjust(unlist(tmp), method = "BH")
    
    result <- bind_rows(df %>% mutate(variable = as.character(variable)), 
              rbind(tmp, adj_value) %>%
                      mutate(variable = c('p.values', 'p.adjust'))
    )