Search code examples
rdataframestatisticst-test

R: Independent t-tests for multiple by groups


I have a dataset named 'dat' with 5 columns: month; mean0; sd0; mean1; sd1. It looks like the following (but with numbers):

month mean0 sd0 mean1 sd1

1
2
3
..
48

I would like to use an independent (not paired) t-test to compare mean0 and mean1 for every month between 1 and 48. Ideally, the output would be put in another dataframe, called 'dat1', with columns for: t-statisitc, degrees of freedom (DF); and a p-value. Like so:

month t-statistic DF p-value
1
2
3
..
48

I have tried using dplyr and broom packages, but cannot seem to figure it out. Any help would be appreciated.


Solution

  • You'll need the n values for both sd's as well. The tsum.test function from the BSDA package will help you do the t-test without your having to write your own function.

    There remains the larger question of the advisability of doing a large number of comparisons in this manner. This link provides information about that.

    With that caveat, here's how to do what you want with some arbitrary data:

    dat <- data.frame(m1=c(24,11,34),
                      sd1=c(1.3,4.2,2.3),
                      n1=c(30, 31, 30),
                      m2=c(18,8,22), 
                      sd2=c(1.8, 3.4, 1.8),
                      n2=c(30,31,30))
    
    # user function to do t-test and return desired values
    do.tsum <- function(x) {
        # tsum.test is quirky, so you have to break out each column's value
        results <- tsum.test(x[1],x[2],x[3],x[4],x[5],x[6],alternative='two.sided')
        return(c(results$statistic, results$parameters, results$p.value))
    }
    
    # use apply to do the tsum.test on each row (1 for rows, 2 for cols)
    # then, transpose the resulting matrix and use the data.frame function
    t.results <- data.frame(t(apply, 1, do.tsum))s
    
    # unfortunately the p-value is returned without no column name (it returns 'm1')
    # use the names function to change the third column name.
    names(t.results)[3] <- 'p.value'
    

    Output is as follows:

              t       df      p.value
    1 14.800910 52.78253 1.982944e-20
    2  3.091083 57.50678 3.072783e-03
    3 22.504396 54.83298 2.277676e-29