Search code examples
rloopst-test

Creating a matrix with t-statistics in R


I have a panel data set and want to create a matrix similar to a correlation matrix but only with the differences of the t-test estimates as well as the t-statistic.

Using the toothgrowth data, I first subgroup supp ids according to their dose values and I want to calculate the t-statistics for all possible combination between the sub groups.

I want my t-test matrix to look as follows

          VC_all  VC_0.5     VC_1  VC_all    VC_0.5  VC_1  OJ_all  OJ_0.5  OJ_1                                                             

VC_all                                                  -4 ( -1.92 )       
VC_0.5
VC_1
VC_all
VC_0.5
VC_1
OJ_all
OJ_0.5
OJ_1

as an example I filled one value with the following formula

t_test <- t.test(x = filter(ToothGrowth, supp== "VC")$len,
                 y = filter(ToothGrowth, supp== "OJ")$len, var.equal = TRUE)

Is there a faster way to this but calculate all t-stats for every single grouping?

df["VC_all","OJ_all"] <- paste(round(t_test$estimate[1] - t_test$estimate[2]), 
                               "(", round(t_test$statistic,2), ")")

Solution

  • You can use this

    # generate data
    df <- data.frame(matrix(rnorm(100*3), ncol= 3))
    # name data
    names(df) <- c("a", "b", "c")
    
    # or to use for your data
    df <- name_of_your_dataframe
    
    # make a dataframe for the results
    results <- data.frame(matrix(rep(NA, ncol(df)*ncol(df)), ncol= ncol(df)))
    # name the results dataframe
    names(results) <- names(df)
    rownames(results) <- names(df)
    # between which columns do we need to run t-tests?
    to_estimate <- t(combn(names(df), 2))
    # replace upper triangle of the matrix with the results
    results[upper.tri(results)] <- apply(to_estimate, 1, function(to_estimate_i){
    t_results <- t.test(df[ , to_estimate_i[1]], df[ , to_estimate_i[2]])
    out <-  paste0(round(t_results$estimate[1] - t_results$estimate[2], 2), " (", round(t_results$statistic, 2), ")")
    })
    # copy upper to lower
    results[lower.tri(results)] <- results[upper.tri(results)]
    

    All you need to do is to replace df with the name of your dataframe