Search code examples
rloopsstatisticst-test

Performing multiple two sample t-tests on two lists of data frames that have many columns


I have two lists with four data frames each. The data frames in both lists ("loc_list_future" and "loc_list_2019) have 33 columns: "Year" and then mean precipitation values for 32 different climate models.

The data frames in loc_list_future look like this but with 32 Model columns total and the data goes to Year 2059:

Year     Model 1    Model 2      Model 3   ...Model 32
2020    714.1101    686.5888    1048.4274       
2021   1018.0095    766.9161     514.2700      
2022    756.7066    902.2542     906.2877       
2023    906.9675    919.5234     647.6630       
2024    767.4008    861.1275     700.2612     
2025    876.1538    738.8370     664.3342       
2026    781.5092    801.2387     743.8965     
2027    876.3522    819.4323     675.3022       
2028    626.9468    927.0774     696.1884       
2029    752.4084    824.7682     835.1566  
...
2059   

The data frames in loc_list_2019 have years ranging from 2006-2019 but otherwise look the same.

Each data frame represents a geographic location, and the two lists have the same four locations but one list is for 2006-2019 values and the other is for future values.

I would like to run two-sample t-tests that compare the 2006-19 values with the future values for each model at each location.

I have another list (loc_list_OBS) that has dataframes with only two columns "Year" and "Mean_Precip" (this is observed data not based off of models which is why there is only one column for mean precip). I have code (see below) that will run two-sample t-tests for the observed data (loc_list_OBS) against the future data (loc_list_future), but I am unsure how I can change this code to run t-tests for the two lists that have 32 models each.

myfun <- function(x,y)
{
  OBS_Data <- x$Mean_Precip
  #Empty list
  List <- list()
  #Now loop
  for(i in 2:dim(y)[2])
  {
    #Label
    val <- names(y[,i,drop=F])
    Future_Data <- y[,i]
    #Test
    test <- t.test(OBS_Data, Future_Data, alternative = "two.sided") 
    #Save
    List[[i-1]] <- test
    names(List)[i-1] <- val
  }
  return(List)
}

t.stat <- mapply(FUN = myfun,x=loc_list_OBS,y=loc_list_future, SIMPLIFY = FALSE) 

Solution

  • I would suggest next approach. I have created dummy data similar to what you have. Here the code:

    #Data before
    dfb <- structure(list(Year = 2010:2019, Model.1 = c(614.1101, 918.0095, 
    656.7066, 806.9675, 667.4008, 776.1538, 681.5092, 776.3522, 526.9468, 
    652.4084), Model.2 = c(586.5888, 666.9161, 802.2542, 819.5234, 
    761.1275, 638.837, 701.2387, 719.4323, 827.0774, 724.7682), Model.3 = c(948.4274, 
    414.27, 806.2877, 547.663, 600.2612, 564.3342, 643.8965, 575.3022, 
    596.1884, 735.1566)), class = "data.frame", row.names = c(NA, 
    -10L))
    #Data after
    dfa <- structure(list(Year = 2020:2029, Model.1 = c(714.1101, 1018.0095, 
    756.7066, 906.9675, 767.4008, 876.1538, 781.5092, 876.3522, 626.9468, 
    752.4084), Model.2 = c(686.5888, 766.9161, 902.2542, 919.5234, 
    861.1275, 738.837, 801.2387, 819.4323, 927.0774, 824.7682), Model.3 = c(1048.4274, 
    514.27, 906.2877, 647.663, 700.2612, 664.3342, 743.8965, 675.3022, 
    696.1884, 835.1566)), class = "data.frame", row.names = c(NA, 
    -10L))
    

    Now the code:

    #Data for lists
    L.before <- list(df1=dfb,df2=dfb,df3=dfb,df4=dfb)
    L.after <- list(df1=dfa,df2=dfa,df3=dfa,df4=dfa)
    

    The function:

    #Function
    myfun <- function(x,y)
    {
      #Create empty list
      List <- list()
      #Loop
      for(i in 2:dim(x)[2])
      {
        name <- names(x[,i,drop=F])
        before <- x[,i]
        after <- y[,i]
        #Test
        test <- t.test(before, after, alternative = "two.sided") 
        #Save
        List[[i-1]] <- test
        names(List)[i-1] <- name
      }
      return(List)
    }
    

    The application:

    #Apply
    t.stat <- mapply(FUN = myfun,x=L.before,y=L.after, SIMPLIFY = FALSE)
    

    Some outputs:

    t.stat[[1]]
    
    $Model.1
    
        Welch Two Sample t-test
    
    data:  before and after
    t = -1.9966, df = 18, p-value = 0.06122
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -205.224021    5.224021
    sample estimates:
    mean of x mean of y 
     707.6565  807.6565 
    
    
    $Model.2
    
        Welch Two Sample t-test
    
    data:  before and after
    t = -2.8054, df = 18, p-value = 0.0117
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -174.88934  -25.11066
    sample estimates:
    mean of x mean of y 
     724.7764  824.7764 
    
    
    $Model.3
    
        Welch Two Sample t-test
    
    data:  before and after
    t = -1.4829, df = 18, p-value = 0.1554
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -241.67613   41.67613
    sample estimates:
    mean of x mean of y 
     643.1787  743.1787 
    

    Let me know if that works for you!