Suppose I have this reproducible dataset:
set.seed(949494)
KPI1 <- round(runif(50, 1, 100))
KPI2 <- round(runif(50, 1, 100))
KPI3 <- round(runif(50, 1, 100))
ID <- rep(c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7", "ID8", "ID9", "ID10"), times = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5))
Stimuli <- rep(rep(c("A", "B"), times = c(5, 5)), 5)
AOI <- rep(c("Text", "Picture", "Button", "Product", "Logo"), 5)
DF <- data.frame(ID, Stimuli, AOI, KPI1, KPI2, KPI3)
Is it possible to do t.tests of all KPI columns per AOI between A & B Stimuli with dplyr?
Currently, I am doing this by hand on a much larger dataset which is very time-consuming:
#SUBSET DATAFRAME into A / B DATAFRAMES
DF_A <- subset(DF, Stimuli == "A")
DF_B <- subset(DF, Stimuli == "B")
#SUBSET A / B DATAFRAMES into AOI DATAFRAMES
DF_A_Text <- subset(DF_A, AOI == "Text")
DF_B_Text <- subset(DF_B, AOI == "Text")
#t.test AOIs A vs B
t.test(DF_A_Text$KPI1, DF_B_Text$KPI1)
t.test(DF_A_Text$KPI2, DF_B_Text$KPI2)
t.test(DF_A_Text$KPI3, DF_B_Text$KPI3)
I then repeat these steps for each AOI "Picture" ... "Logo", which is very time consuming. I think it is possible with dyplr... just not able to master the syntax with my specific use case.
Final goal is to then summarize each p-value of the t-tests next to the summaries per Stimuli AvsB (average each KPI(1:3) across all ID(1:10) for each AOI(1:5):
Thankful for any help I can get as I'm an R beginner.
I would use the dplyr
package for this analysis as follows:
library(dplyr)
DF %>%
pivot_longer(starts_with("KP"), names_to = "KP", values_to = "value") %>%
group_by(AOI, KP) %>%
nest() %>%
mutate(
pval = map_dbl(data, ~t.test(value ~ Stimuli, data = .x)$p.value),
mean_a = map_dbl(data, ~mean(.x$value[.x$Stimuli == "A"])),
mean_b = map_dbl(data, ~mean(.x$value[.x$Stimuli == "B"]))
) %>%
select(-data) %>%
arrange(KP, AOI)