Search code examples
rt-test

Independent t-test with an aov-object in R


Is there a simple way to compute independent t-tests with an aov object, similar to how pairwise comparisons can be computed by pairs(emmeans(aov.object,~variable)?


Solution

  • I'm trying to discern what the question really is. This answer is based on this guess:

    Is it possible to use emmeans() to do all independent t tests comparing several samples, two at a time?

    Pooled t tests: NO (similar but not exactly)

    The answer to that question is "not exactly". I'll illustrate with the InsectSprays data in the datasets package. We may fit this aov model:

    > fm1 <- aov(count ~ spray, data = InsectSprays)
    

    Then use emmeans to compare the sprays:

    > library("emmeans")
    > pairs(emmeans(fm1, "spray"), adjust = "none")
    
     contrast estimate  SE df t.ratio p.value
     A - B      -0.833 1.6 66 -0.520  0.6045 
     A - C      12.417 1.6 66  7.755  <.0001 
     A - D       9.583 1.6 66  5.985  <.0001 
     A - E      11.000 1.6 66  6.870  <.0001 
     A - F      -2.167 1.6 66 -1.353  0.1806 
     B - C      13.250 1.6 66  8.276  <.0001 
     B - D      10.417 1.6 66  6.506  <.0001 
     B - E      11.833 1.6 66  7.391  <.0001 
     B - F      -1.333 1.6 66 -0.833  0.4080 
     C - D      -2.833 1.6 66 -1.770  0.0814 
     C - E      -1.417 1.6 66 -0.885  0.3795 
     C - F     -14.583 1.6 66 -9.108  <.0001 
     D - E       1.417 1.6 66  0.885  0.3795 
     D - F     -11.750 1.6 66 -7.339  <.0001 
     E - F     -13.167 1.6 66 -8.223  <.0001 
    

    This does perform t tests comparing two means at a time (adding adjust = "none" specifies that there is no multiplicity adjustment, so each test is taken as-is with no reference to the other test). The values of the means being compared are exactly the means that would be compared if you had run t.test() separately with each pair of samples. However, these results are not exactly the same as the t.test() results would be, because the error variance in fm1 is estimated by pooling all 6 samples together, whereas t.test() would use the variability in only the two samples used in a particular test. I'll show this explicitly for the A - B comparison:

    > t.test(count ~ spray, data = InsectSprays, subset = (spray %in% c("A","B")), var.equal = TRUE)
    
        Two Sample t-test
    
    data:  count by spray
    t = -0.45352, df = 22, p-value = 0.6546
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -4.643994  2.977327
    sample estimates:
    mean in group A mean in group B 
           14.50000        15.33333
    

    Note that the t ratio is slightly different, and there are 22 d.f. rather than 66, because only the variances of the two samples were pooled.

    Welch t tests: YES

    It is possible to use a similar method to obtain all of the t tests exactly, under different assumptions that the variances are not all equal. This requires a model that does not specify equal variances, as aov() does. Here is the same example using a gls() model:

    > library("nlme")
    > fm2 <- gls(count ~ spray, data = InsectSprays, weights = varIdent(form = ~ 1 | spray))
    
    > pairs(emmeans(fm2, "spray", df.method = "boot"), adjust = "none")
     contrast estimate    SE   df t.ratio p.value
     A - B      -0.833 1.837 21.8 -0.454  0.6547 
     A - C      12.417 1.477 14.7  8.407  <.0001 
     A - D       9.583 1.542 16.7  6.214  <.0001 
     A - E      11.000 1.451 13.9  7.580  <.0001 
     A - F      -2.167 2.252 20.5 -0.962  0.3473 
     B - C      13.250 1.358 15.5  9.754  <.0001 
     B - D      10.417 1.429 17.8  7.289  <.0001 
     B - E      11.833 1.330 14.5  8.894  <.0001 
     B - F      -1.333 2.177 19.5 -0.613  0.5472 
     C - D      -2.833 0.920 20.9 -3.078  0.0057 
     C - E      -1.417 0.758 21.6 -1.868  0.0754 
     C - F     -14.583 1.882 13.2 -7.748  <.0001 
     D - E       1.417 0.879 19.6  1.612  0.1229 
     D - F     -11.750 1.934 14.5 -6.076  <.0001 
     E - F     -13.167 1.862 12.7 -7.071  <.0001 
    
    Degrees-of-freedom method: satterthwaite 
    

    Note that the test of the A - B comparison is identical to that of t.test(), which by default takes the variances to be unequal:

    > t.test(count ~ spray, data = InsectSprays, subset = (spray %in% c("A","B")))
    
        Welch Two Sample t-test
    
    data:  count by spray
    t = -0.45352, df = 21.784, p-value = 0.6547
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -4.646182  2.979515
    sample estimates:
    mean in group A mean in group B 
           14.50000        15.33333