Search code examples
rdataframestatisticstidyverse

How to use !!sym when performing a shapiro_test on a dataframe undergoing a for loop?


I am trying to loop through a dataframe and run both statistical summaries, tests and graph a plot by looping through certain columns within the dataframe. I am new to for loops in r, and sort of understand !!sym but still learning about it. Asking for advice on how to use it when calculating the difference of means as part of the shapiro test?

For example in the example dataset, i want to run a paired t-test to see if treatment A and B per sample has any influence on "effect", "intake", "Temperature"; but first need to identify whether my dataset is normally distributed or not. Hence i use shapiro.test in the code below.

Example dataset from an excel sheet

ID Treatment Effect Intake Count Temperature
1  A         0.1    1      8     20
1  B         0.4    3      9     21
2  A         0.1    3      0     27
2  B         0.2    4      5     28
3  A         0.4    1      14    21
3  B         0.6    4      4     23
...          ...    ...    ...   ...
library(tidyverse)
library(readxl)

df <- read_excel(paste0(getwd(),"/Data.xlsm"), sheet="data")

for (i in c("Effect","Intake", "Temperature")){

#other code is here for means, etc.

#code for shapiro test where i am having the issue

  mean_diff <- with(df, (!!sym(i))[Treatment == "A"] - (!!sym(i))[Treatment == "B"])
  s_test <- tidy(shapiro.test(mean_diff))

#other code to graph
}

Error i get at mean_diff code:

Error in !sym(i) : invalid argument type

Solution

  • Here's a solution based on the tidyverse and random data (with your original variable names) since I composed it before your edit was visible.

    library(tidyverse)
    
    df <- tibble(
      Treatment = rep(c("Saline", "CNO"), each = 5),
      Poke_Time = rnorm(10),
      Retrieval_Time = rnorm(10)
    )
    
    df %>% 
      summarise(
        across(
          -Treatment, 
          function(x) {
            y <- df %>% filter(Treatment == "Saline") %>% pull(cur_column())
            z <- df %>% filter(Treatment == "CNO") %>% pull(cur_column())
            shapiro.test(y - z)$p.value
          }
        )
      )
    # A tibble: 1 × 2
      Poke_Time Retrieval_Time
          <dbl>          <dbl>
    1     0.586          0.600
    

    You can edit the code to provide summaries other than p.value if you wish. You can even capture the entire output from shapiro.test with list(shapiro.test( y - z)).