Search code examples
rdplyrtidyverse

Bind rows and perform left join from a set piped lists using tidyverse method


Here is the dput() deconstruction of the data.

library(tidyverse)

structure(list(L1 = c("Age Class", "Age Class", "Age Class", 
"Age Class", "Gender", "Gender", "Gender", "Age Class", "Age Class", 
"Age Class", "Gender", "Gender", "Age Class", "Age Class", "Age Class", 
"Gender"), L2 = c("Older Youth", "Older Youth", "Younger Youth", 
"Younger Youth", "Female", "Female", "Female", "Younger Youth", 
"Older Youth", "Older Youth", "Male", "Male", "Younger Youth", 
"Older Youth", "Older Youth", "Female"), scr = c(0.78125, 0.90625, 
0.90625, 0.6875, 0.875, 0.78125, 1, 0.65625, 0.75, 0.59375, 0.8125, 
0.75, 0.65625, 0.6875, 0.75, 0.75)), row.names = c(NA, -16L), class = "data.frame")

enter image description here

  1. I want to perform median and standard error as overall statistics

  2. Perform again median and standard error by grouping L1 and L2

  3. Perform wilcoxon test within L1 since it contains 2 factors for each group.

  4. Merge these three lists: a) by bind_rows() from the results of step1 and step2. Then left_join() the p.values (step3) with the dataset.

Desired end result will look like the picture below:

enter image description here

I have tried creating a list() for each of the steps within dplyr but handling list() i.e. selection or filtering in dplyr or piped environment is cumbersome. However, the following chunk works but I want to reduce list handling as much as possible. Especially the second half of the code I think can be reduced or simplified.

df %>% 
  list(
    a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
    b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
    c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)) %>% 
  list(
    d= {.} %>% keep(names(.) %in% c('a','b')) %>% bind_rows(), #Reduce codes from this line
    c= {.} %>% pluck("c")) %>% 
  keep(names(.) %in% c('c','d')) %>%
  reduce(left_join, by="L1") #to this line

Would like to know whether there is any scope of nesting dataframe. Any purrr::map() way of reducing the scripts/texts.


Solution

  • Using %$% instead of %>% (from magrittr) will keep the data independent and not ingest first argument into the pipe and therefore just doing %$% left_join(bind_rows(.$a, .$b), .$c) just like the regular dataframe way will suffice.

    library(magrittr)
    df %$% 
      list(
        a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
        b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
        c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)
        ) %$% 
      left_join(bind_rows(.$a, .$b), .$c)