Search code examples
rloopspurrrt-testbroom

Comparing multiple variables in more than two groups with t.test


I tried to do a t-test comparing values between time1/2/3.. and threshold. here is my data frame:

time.df1<-data.frame("condition" =c("A","B","C","A","C","B"), 
"time1" = c(1,3,2,6,2,3) ,
"time2" = c(1,1,2,8,2,9) ,
"time3" = c(-2,12,4,1,0,6),
"time4" = c(-8,3,2,1,9,6),
"threshold" = c(-2,3,8,1,9,-3))

and I tried to compare each two values by:

time.df1%>% 
select_if(is.numeric)  %>%
purrr::map_df(~ broom::tidy(t.test(. ~ threshold)))

However, I got this error message

 Error in eval(predvars, data, env) : object 'threshold' not found

So, I tried another way (maybe it is wrong)

time.df2<-time.df1%>%gather(TF,value,time1:time4)
time.df2%>% group_by(condition) %>% do(tidy(t.test(value~TF, data=.)))

sadly, I got this error. Even I limited the condition to only two levels (A,B)

 Error in t.test.formula(value ~ TF, data = .) : grouping factor must have exactly 2 levels

I wish to loop t-test over each time column to threshold column per condition, then using broom::tidy to get the results in tidy format. My approaches apparently aren't working, any advice is much appreciated to improve my codes.


Solution

  • An alternative route would be to define a function with the required options for t.test() up front, then create data frames for each pair of variables (i.e. each combination of 'time*' and 'threshold') and nesting them into list columns and use map() combined with relevant functions from 'broom' to simplify the outputs.

    library(tidyverse)
    library(broom)
    
    ttestfn <- function(data, ...){
      # amend this function to include required options for t.test
      res = t.test(data$resp, data$threshold)
      return(res)
    }   
    
    df2 <-   
    time.df1 %>% 
      gather(time, "resp", - threshold, -condition) %>% 
      group_by(time) %>% 
      nest() %>% 
      mutate(ttests = map(data, ttestfn),
             glances = map(ttests, glance))
    # df2 has data frames, t-test objects and glance summaries 
    # as separate list columns
    

    Now it's easy to query this object to extract what you want

    df2 %>% 
    unnest(glances, .drop=TRUE)
    

    However, it's unclear to me what you want to do with 'condition', so I'm wondering if it is more straightforward to reframe the question in terms of a GLM (as camille suggested in the comments: ANOVA is part of the GLM family).

    Reshape the data, define 'threshold' as the reference level of the 'time' factor and the default 'treatment' contrasts used by R will compare each time to 'threshold':

    time.df2 <- 
      time.df1 %>% 
      gather(key = "time", value = "resp", -condition) %>% 
      mutate(time = fct_relevel(time, "threshold")) # define 'threshold' as baseline
    
    fit.aov <- aov(resp ~ condition * time, data = time.df2)
    summary(fit.aov)
    summary.lm(fit.aov) # coefficients and p-values
    

    Of course this assumes that all subjects are independent (i.e. there are no repeated measures). If not, then you'll need to move on to more complicated procedures. Anyway, moving to appropriate GLMs for the study design should help minimise the pitfalls of doing multiple t-tests on the same data set.