Search code examples
rfunctionloopscomparison

How to write a function that conducts paired t-tests on all group/variable combinations in a data frame


I have a data frame similar to data created below:

ID <- data.frame(ID=rep(c(12,122,242,329,595,130,145,245,654,878),each=5))
Var <- data.frame(Variable=c("Copper","Iron","Lead","Zinc","CaCO"))
n <- 10
Variable <- do.call("rbind",replicate(n,Var,simplify=F))
Location <- rep(c("Alpha","Beta","Gamma"), times=c(20,20,10))
Location <- data.frame(Location)
set.seed(1)
FirstPt<- data.frame(FirstPt=sample(1:100,50,replace=T))
LastPt <- data.frame(LastPt=sample(1:100,50,replace=T))
First3<- data.frame(First3=sample(1:100,50,replace=T))
First5<- data.frame(First5=sample(1:100,50,replace=T))
First7<- data.frame(First7=sample(1:100,50,replace=T))
First10<- data.frame(First10=sample(1:100,50,replace=T))
Last3<- data.frame(Last3=sample(1:100,50,replace=T))
Last5<- data.frame(Last5=sample(1:100,50,replace=T))
Last7<- data.frame(Last7=sample(1:100,50,replace=T))
Last10<- data.frame(Last10=sample(1:100,50,replace=T))
data <- cbind(ID,Location,Variable,FirstPt,LastPt,First3,First5,First7,
              First10,Last3,Last5,Last7,Last10)

This may be a two part question, but I want to write a function that groups all Variables that are the same (for instance, all the observations that are Copper) and conducts a paired t test between all possible combinations of the numeric columns (FirstPt:Last10). I want it to return the p values in a data frame like this:

Test                        P-Value
FirstPt.vs.LastPt             …
FirstPt.vs.First3             … 
ect...                        … 

This will likely be a second function, but I also want to do this after the observations are grouped by Location so that the output data frame will look like this:

Test                                   P-Value
FirstPt.vs.LastPt.InAlpha
FirstPt.vs.LastPt.InBeta        
ect... 

Solution

  • You can do both of these with one function:

    library(tidyverse)
    
    t.test.by.group.combos <- function(.data, groups){
      by <-  gsub(x = rlang::quo_get_expr(enquo(groups)), pattern = "\\((.*)?\\)", replacement = "\\1")[-1]
      .data %>%
        group_by(!!!groups) %>%
        select_if(is.integer) %>%
        group_split() %>%
        map(.,
          ~pivot_longer(., cols = (FirstPt:Last10), names_to = "name", values_to = "val") %>%
            nest(data = val) %>%
            full_join(.,.,by = by) %>%
            filter(name.x != name.y) %>%
            mutate(test = paste(name.x, "vs",name.y, !!!groups, sep = "."),
                   p.value = map2_dbl(data.x,data.y, ~t.test(unlist(.x), unlist(.y))$p.value)) %>%
            select(test,p.value)%>%
          filter(!duplicated(p.value))
        ) %>%
        bind_rows() 
    }
    
    
    t.test.by.group.combos(data, vars(Variable))
    #> # A tibble: 225 x 2
    #>    test                    p.value
    #>    <chr>                     <dbl>
    #>  1 FirstPt.vs.LastPt.CaCO    0.511
    #>  2 FirstPt.vs.First3.CaCO    0.184
    #>  3 FirstPt.vs.First5.CaCO    0.494
    #>  4 FirstPt.vs.First7.CaCO    0.354
    #>  5 FirstPt.vs.First10.CaCO   0.893
    #>  6 FirstPt.vs.Last3.CaCO     0.496
    #>  7 FirstPt.vs.Last5.CaCO     0.909
    #>  8 FirstPt.vs.Last7.CaCO     0.439
    #>  9 FirstPt.vs.Last10.CaCO    0.146
    #> 10 LastPt.vs.First3.CaCO     0.578
    #> # … with 215 more rows
    
    t.test.by.group.combos(data, vars(Variable, Location))
    #> # A tibble: 674 x 2
    #>    test                          p.value
    #>    <chr>                           <dbl>
    #>  1 FirstPt.vs.LastPt.CaCO.Alpha    0.850
    #>  2 FirstPt.vs.First3.CaCO.Alpha    0.822
    #>  3 FirstPt.vs.First5.CaCO.Alpha    0.895
    #>  4 FirstPt.vs.First7.CaCO.Alpha    0.810
    #>  5 FirstPt.vs.First10.CaCO.Alpha   0.645
    #>  6 FirstPt.vs.Last3.CaCO.Alpha     0.870
    #>  7 FirstPt.vs.Last5.CaCO.Alpha     0.465
    #>  8 FirstPt.vs.Last7.CaCO.Alpha     0.115
    #>  9 FirstPt.vs.Last10.CaCO.Alpha    0.474
    #> 10 LastPt.vs.First3.CaCO.Alpha     0.991
    #> # … with 664 more rows
    

    This is kind of a lengthy function, but in general we group by the groups argument, then we select the groups and any integer columns, then we split the dataframe by the groups. After, we map all the combinations of variables and perform t.tests for each combo. Lastly, we rejoin all the groups into one dataframe.