Search code examples
dplyrt-test

R: t.test multiple variables in dataframe with dplyr then summarise in table


Suppose I have this reproducible dataset:

set.seed(949494)
KPI1 <- round(runif(50, 1, 100))
KPI2 <- round(runif(50, 1, 100))
KPI3 <- round(runif(50, 1, 100))
ID <- rep(c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7", "ID8", "ID9", "ID10"), times = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5))
Stimuli <- rep(rep(c("A", "B"), times = c(5, 5)), 5)
AOI <- rep(c("Text", "Picture", "Button", "Product", "Logo"), 5)
DF <- data.frame(ID, Stimuli, AOI, KPI1, KPI2, KPI3)

Is it possible to do t.tests of all KPI columns per AOI between A & B Stimuli with dplyr?

Currently, I am doing this by hand on a much larger dataset which is very time-consuming:

#SUBSET DATAFRAME into A / B DATAFRAMES
DF_A <- subset(DF, Stimuli == "A")
DF_B <- subset(DF, Stimuli == "B")

#SUBSET A / B DATAFRAMES into AOI DATAFRAMES
DF_A_Text <- subset(DF_A, AOI == "Text")
DF_B_Text <- subset(DF_B, AOI == "Text")


#t.test AOIs A vs B
t.test(DF_A_Text$KPI1, DF_B_Text$KPI1)

t.test(DF_A_Text$KPI2, DF_B_Text$KPI2)

t.test(DF_A_Text$KPI3, DF_B_Text$KPI3)

I then repeat these steps for each AOI "Picture" ... "Logo", which is very time consuming. I think it is possible with dyplr... just not able to master the syntax with my specific use case.

Final goal is to then summarize each p-value of the t-tests next to the summaries per Stimuli AvsB (average each KPI(1:3) across all ID(1:10) for each AOI(1:5): enter image description here

Thankful for any help I can get as I'm an R beginner.


Solution

  • I would use the dplyr package for this analysis as follows:

    library(dplyr)
    
    DF %>% 
      pivot_longer(starts_with("KP"), names_to = "KP", values_to = "value") %>% 
      group_by(AOI, KP) %>% 
      nest() %>% 
      mutate(
        pval = map_dbl(data, ~t.test(value ~ Stimuli, data = .x)$p.value), 
        mean_a = map_dbl(data, ~mean(.x$value[.x$Stimuli == "A"])), 
        mean_b = map_dbl(data, ~mean(.x$value[.x$Stimuli == "B"]))
      ) %>% 
      select(-data) %>% 
      arrange(KP, AOI)