I am trying to loop through a dataframe and run both statistical summaries, tests and graph a plot by looping through certain columns within the dataframe. I am new to for loops in r, and sort of understand !!sym but still learning about it. Asking for advice on how to use it when calculating the difference of means as part of the shapiro test?
For example in the example dataset, i want to run a paired t-test to see if treatment A and B per sample has any influence on "effect", "intake", "Temperature"; but first need to identify whether my dataset is normally distributed or not. Hence i use shapiro.test in the code below.
Example dataset from an excel sheet
ID Treatment Effect Intake Count Temperature
1 A 0.1 1 8 20
1 B 0.4 3 9 21
2 A 0.1 3 0 27
2 B 0.2 4 5 28
3 A 0.4 1 14 21
3 B 0.6 4 4 23
... ... ... ... ...
library(tidyverse)
library(readxl)
df <- read_excel(paste0(getwd(),"/Data.xlsm"), sheet="data")
for (i in c("Effect","Intake", "Temperature")){
#other code is here for means, etc.
#code for shapiro test where i am having the issue
mean_diff <- with(df, (!!sym(i))[Treatment == "A"] - (!!sym(i))[Treatment == "B"])
s_test <- tidy(shapiro.test(mean_diff))
#other code to graph
}
Error i get at mean_diff code:
Error in !sym(i) : invalid argument type
Here's a solution based on the tidyverse and random data (with your original variable names) since I composed it before your edit was visible.
library(tidyverse)
df <- tibble(
Treatment = rep(c("Saline", "CNO"), each = 5),
Poke_Time = rnorm(10),
Retrieval_Time = rnorm(10)
)
df %>%
summarise(
across(
-Treatment,
function(x) {
y <- df %>% filter(Treatment == "Saline") %>% pull(cur_column())
z <- df %>% filter(Treatment == "CNO") %>% pull(cur_column())
shapiro.test(y - z)$p.value
}
)
)
# A tibble: 1 × 2
Poke_Time Retrieval_Time
<dbl> <dbl>
1 0.586 0.600
You can edit the code to provide summaries other than p.value
if you wish. You can even capture the entire output from shapiro.test
with list(shapiro.test( y - z))
.