Search code examples
rstatisticscovarianceanovat-test

How to run a multiple t-tests or ANOVA that controls for multiple variables in R?


I have df1:

    Rate       Dogs        MHI_2018  Points      Level AGE65_MORE P_Elderly
1  0.10791173  0.00000000    59338 236.4064         C       8653  15.56267
2  0.06880040  0.00000000    57588 229.4343         C      44571  20.44335
3  0.08644537  0.00000000    50412 200.8446         C      10548  18.23651
4  0.29591635  0.00000000    29267 116.6016         A       1661  16.38390
5  0.05081301  0.00000000    37365 148.8645         B       3995  20.29980
6  0.02625200  0.00000000    45400 180.8765         D      20247  17.71748
7  0.80321285  0.02974862    39917 159.0319         D       6562  19.52105
8  0.07682852  0.00000000    42132 167.8566         D       5980  22.97173
9  0.18118814  0.00000000    47547 189.4303         B       7411  16.78482
10 0.07787555  0.00000000    39907 158.9920         B       2953  22.99665
11 0.15065913  0.00000000    39201 156.1793         C       2751  20.72316
12 0.33362247  0.00000000    46495 185.2390         B       2915  19.45019
13 0.03652168  0.00000000    49055 195.4382         B      10914  19.92988
14 0.27998133  0.00000000    42423 169.0159         A       2481  23.15446
15 0.05407451  0.00000000    40203 160.1713         A       7790  21.06202
16 0.07233796  0.00000000    39057 155.6056         A       2629  19.01765
17 0.08389061  0.00000000    45796 182.4542         B      15446  18.51106
18 0.05220569  0.00000000    34035 135.5976         B       6921  18.06578
19 0.05603418  0.00000000    39491 157.3347         B      12322  17.26133
20 0.15875536  0.00000000    60367 240.5060         C      12400  15.14282

I would like to test if the means of Rate of four different Level groups (A,B,C,D) significantly differ. I know that I can usually run a t-test if there are two groups in level, but since there are four groups, I was thinking I could either run 6 t-tests, or I could run an ANOVA, how is an ANOVA run and interpreted?

Additionally, I would like to see if the variable P_Elderly is a significant covariate that explains some of the relationship between Level and Rate. If I have additional covariates that I would like to add later, how would I do that?


Solution

  • You can fit a linear model, where the Rate is explained by Level:

    fit0 = lm(Rate ~ Level,data=df)
    

    You can look at the coefficients:

    coefs = coefficients(fit0)
    coefs
    (Intercept)      LevelB      LevelC      LevelD 
     0.17557754 -0.06655862 -0.06106314  0.12652025
    

    Here A is set as the reference, and the coefficients indicate how much their means differ from that of A. So we can test whether Level B : D are zero, meaning one common intercept is enough for this model:

    library(car)
    linearHypothesis(fit0,names(coefs)[-1],test="F")
    Linear hypothesis test
    
    Hypothesis:
    LevelB = 0
    LevelC = 0
    LevelD = 0
    
    Model 1: restricted model
    Model 2: Rate ~ Level
    
      Res.Df     RSS Df Sum of Sq      F Pr(>F)
    1     19 0.59848                           
    2     16 0.50688  3  0.091608 0.9639 0.4338
    

    This is similar to an anova, where you test the significance of all your level coefficients at one go.

    anova(fit0)
    Analysis of Variance Table
    
    Response: Rate
              Df  Sum Sq  Mean Sq F value Pr(>F)
    Level      3 0.09161 0.030536  0.9639 0.4338
    Residuals 16 0.50688 0.031680
    

    So most likely the means are not very different, going by the above. You can also do a pairwise test like this:

    library(multcomp)
    summary(glht(fit0,linfct = mcp(Level = "Tukey")))
    

    For your next question, how to add a covariate, you will fit another model:

    fit_full = lm(Rate ~ Level+P_Elderly,data=df)
    

    And compare it against the model with only Level:

    anova(f0,fit_full)
    
    Analysis of Variance Table
    
    Model 1: Rate ~ Level
    Model 2: Rate ~ Level + P_Elderly
      Res.Df     RSS Df Sum of Sq      F Pr(>F)
    1     16 0.50688                           
    2     15 0.50150  1 0.0053721 0.1607 0.6942
    

    Against, doesn't seem like Elderly has much of an effect..