Search code examples
rloopsmapplyt-test

Performing multiple two sample t-tests on two lists of data frames


I have two lists with four data frames each. The data frames in the first list ("loc_list_OBS") have only two columns "Year" and "Mean_Precip" while the data frames in the second list ("loc_list_future") have 33 columns "Year" and then mean precipitation values for 32 different models.

So the data frames in loc_list_OBS look like this but the data goes until Year 2005:

Year     Mean_Precip
1950    799.1309
1951    748.0239
1952    619.7572
1953    799.9263
1954    680.9194
1955    766.2304
1956    599.5365
1957    717.8912
1958    739.4901
1959    707.1130
...     ....
2005    ....

And the data frames in loc_list_future look like this but with 32 Model columns total and the data goes to Year 2059:

Year   Model 1      Model 2      Model 3    ...... Model 32
2020    714.1101    686.5888    1048.4274
2021    1018.0095    766.9161     514.2700
2022    756.7066    902.2542     906.2877
2023    906.9675    919.5234     647.6630
2024    767.4008    861.1275     700.2612
2025    876.1538    738.8370     664.3342
2026    781.5092    801.2387     743.8965
2027    876.3522    819.4323     675.3022
2028    626.9468    927.0774     696.1884
2029    752.4084    824.7682     835.1566
....    .....       .....         .....
2059    .....       .....         .....

Each data frame represents a geographic location, and the two lists have the same four locations but one list is for observed values and the other is for predicted future values.

I would like to run two sample t-tests that compare the observed values with the predicted future values for each model at each location. Put another way, I want to compare the first data frame in each list, then the second data frame in each list, and the same with the third and fourth data frames.

Here is the code I have used:

t_stat = NULL
mapply(FUN = function(f, o) {
 t_stat <- t.test(o$Mean_Precip, f, alternative = "two.sided")  
}, f = loc_list_ttest, o = loc_list_OBS, SIMPLIFY = FALSE)
t_stat

This code only gives me four t-test outputs that are comparing the "Mean_Precip" columns in the observed data with what appears to be a combination of all the models in the future data. However I need a t-test for each model at each location. Can anyone figure out how to do this?


Solution

  • You can tackle the issue with an approach like this. I understood that you want to compare each dataframe with other and obtain a t-test for each variable across second dataframe. One approach is to create a function to loop across the variables in second dataframe and then save the results in a list. You will have four list and inside each of them all the t-test. I have created dummy data based on what you shared:

    #Data
    df <- structure(list(Year = c(1950L, 1951L, 1952L, 1953L, 1954L, 1955L, 
    1956L, 1957L, 1958L, 1959L, 2005L), Mean_Precip = c(799.1309, 
    748.0239, 619.7572, 799.9263, 680.9194, 766.2304, 599.5365, 717.8912, 
    739.4901, 707.113, 707.113)), class = "data.frame", row.names = c(NA, 
    -11L))
    #Data2
    df1 <- structure(list(Year = c(2020L, 2021L, 2022L, 2023L, 2024L, 2025L, 
    2026L, 2027L, 2028L, 2029L, 2059L), Model.1 = c(714.1101, 1018.0095, 
    756.7066, 906.9675, 767.4008, 876.1538, 781.5092, 876.3522, 626.9468, 
    752.4084, 752.4084), Model.2 = c(686.5888, 766.9161, 902.2542, 
    919.5234, 861.1275, 738.837, 801.2387, 819.4323, 927.0774, 824.7682, 
    824.7682), Model.3 = c(1048.4274, 514.27, 906.2877, 647.663, 
    700.2612, 664.3342, 743.8965, 675.3022, 696.1884, 835.1566, 835.1566
    )), class = "data.frame", row.names = c(NA, -11L))
    

    Now, we will create the lists (you must have them):

    #Lists
    List1 <- list(df1=df,df2=df,df3=df,df4=df)
    List2 <- list(df1=df1,df2=df1,df3=df1,df4=df1)
    

    Here is the function:

    #Function
    myfun <- function(x,y)
    {
      l <- x$Mean_Precip
      #Empty list
      List <- list()
      #Now loop
      for(i in 2:dim(y)[2])
      {
        #Label
        val <- names(y[,i,drop=F])
        r <- y[,i]
        #Test
        test <- t.test(l, r, alternative = "two.sided") 
        #Save
        List[[i-1]] <- test
        names(List)[i-1] <- val
      }
      return(List)
    }
    

    Finally, we apply:

    #Apply
    t.stat <- mapply(FUN = myfun,x=List1,y=List2,SIMPLIFY = FALSE)
    

    The output is a list of lists and you can explore each element as next:

    t.stat[[1]]
    

    Where you will find the results from comparing first dataframe against all the variables from the second dataframe:

    Output:

    $Model.1
    
        Welch Two Sample t-test
    
    data:  l and r
    t = -2.2645, df = 16.448, p-value = 0.03738
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -165.949710   -5.657818
    sample estimates:
    mean of x mean of y 
     716.8302  802.6339 
    
    
    $Model.2
    
        Welch Two Sample t-test
    
    data:  l and r
    t = -3.5901, df = 19.56, p-value = 0.001881
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -170.75516  -45.13574
    sample estimates:
    mean of x mean of y 
     716.8302  824.7756 
    
    
    $Model.3
    
        Welch Two Sample t-test
    
    data:  l and r
    t = -0.72149, df = 13.829, p-value = 0.4826
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -138.01368   68.59334
    sample estimates:
    mean of x mean of y 
     716.8302  751.5403