Here is the dput()
deconstruction of the data.
library(tidyverse)
structure(list(L1 = c("Age Class", "Age Class", "Age Class",
"Age Class", "Gender", "Gender", "Gender", "Age Class", "Age Class",
"Age Class", "Gender", "Gender", "Age Class", "Age Class", "Age Class",
"Gender"), L2 = c("Older Youth", "Older Youth", "Younger Youth",
"Younger Youth", "Female", "Female", "Female", "Younger Youth",
"Older Youth", "Older Youth", "Male", "Male", "Younger Youth",
"Older Youth", "Older Youth", "Female"), scr = c(0.78125, 0.90625,
0.90625, 0.6875, 0.875, 0.78125, 1, 0.65625, 0.75, 0.59375, 0.8125,
0.75, 0.65625, 0.6875, 0.75, 0.75)), row.names = c(NA, -16L), class = "data.frame")
I want to perform median and standard error as overall statistics
Perform again median and standard error by grouping L1 and L2
Perform wilcoxon test within L1 since it contains 2 factors for each group.
Merge these three lists: a) by bind_rows()
from the results of step1 and step2. Then left_join()
the p.values (step3) with the dataset.
Desired end result will look like the picture below:
I have tried creating a list()
for each of the steps within dplyr
but handling list()
i.e. selection or filtering in dplyr
or piped environment is cumbersome. However, the following chunk works but I want to reduce list handling as much as possible. Especially the second half of the code I think can be reduced or simplified.
df %>%
list(
a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)) %>%
list(
d= {.} %>% keep(names(.) %in% c('a','b')) %>% bind_rows(), #Reduce codes from this line
c= {.} %>% pluck("c")) %>%
keep(names(.) %in% c('c','d')) %>%
reduce(left_join, by="L1") #to this line
Would like to know whether there is any scope of nesting dataframe. Any purrr::map()
way of reducing the scripts/texts.
Using %$%
instead of %>%
(from magrittr) will keep the data independent and not ingest first argument into the pipe and therefore just doing %$% left_join(bind_rows(.$a, .$b), .$c)
just like the regular dataframe way will suffice.
library(magrittr)
df %$%
list(
a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)
) %$%
left_join(bind_rows(.$a, .$b), .$c)