I have df1:
Rate Dogs MHI_2018 Points Level AGE65_MORE P_Elderly
1 0.10791173 0.00000000 59338 236.4064 C 8653 15.56267
2 0.06880040 0.00000000 57588 229.4343 C 44571 20.44335
3 0.08644537 0.00000000 50412 200.8446 C 10548 18.23651
4 0.29591635 0.00000000 29267 116.6016 A 1661 16.38390
5 0.05081301 0.00000000 37365 148.8645 B 3995 20.29980
6 0.02625200 0.00000000 45400 180.8765 D 20247 17.71748
7 0.80321285 0.02974862 39917 159.0319 D 6562 19.52105
8 0.07682852 0.00000000 42132 167.8566 D 5980 22.97173
9 0.18118814 0.00000000 47547 189.4303 B 7411 16.78482
10 0.07787555 0.00000000 39907 158.9920 B 2953 22.99665
11 0.15065913 0.00000000 39201 156.1793 C 2751 20.72316
12 0.33362247 0.00000000 46495 185.2390 B 2915 19.45019
13 0.03652168 0.00000000 49055 195.4382 B 10914 19.92988
14 0.27998133 0.00000000 42423 169.0159 A 2481 23.15446
15 0.05407451 0.00000000 40203 160.1713 A 7790 21.06202
16 0.07233796 0.00000000 39057 155.6056 A 2629 19.01765
17 0.08389061 0.00000000 45796 182.4542 B 15446 18.51106
18 0.05220569 0.00000000 34035 135.5976 B 6921 18.06578
19 0.05603418 0.00000000 39491 157.3347 B 12322 17.26133
20 0.15875536 0.00000000 60367 240.5060 C 12400 15.14282
I would like to test if the means of Rate
of four different Level
groups (A,B,C,D) significantly differ. I know that I can usually run a t-test if there are two groups in level, but since there are four groups, I was thinking I could either run 6 t-tests, or I could run an ANOVA, how is an ANOVA run and interpreted?
Additionally, I would like to see if the variable P_Elderly
is a significant covariate that explains some of the relationship between Level
and Rate
. If I have additional covariates that I would like to add later, how would I do that?
You can fit a linear model, where the Rate is explained by Level:
fit0 = lm(Rate ~ Level,data=df)
You can look at the coefficients:
coefs = coefficients(fit0)
coefs
(Intercept) LevelB LevelC LevelD
0.17557754 -0.06655862 -0.06106314 0.12652025
Here A is set as the reference, and the coefficients indicate how much their means differ from that of A. So we can test whether Level B : D are zero, meaning one common intercept is enough for this model:
library(car)
linearHypothesis(fit0,names(coefs)[-1],test="F")
Linear hypothesis test
Hypothesis:
LevelB = 0
LevelC = 0
LevelD = 0
Model 1: restricted model
Model 2: Rate ~ Level
Res.Df RSS Df Sum of Sq F Pr(>F)
1 19 0.59848
2 16 0.50688 3 0.091608 0.9639 0.4338
This is similar to an anova, where you test the significance of all your level coefficients at one go.
anova(fit0)
Analysis of Variance Table
Response: Rate
Df Sum Sq Mean Sq F value Pr(>F)
Level 3 0.09161 0.030536 0.9639 0.4338
Residuals 16 0.50688 0.031680
So most likely the means are not very different, going by the above. You can also do a pairwise test like this:
library(multcomp)
summary(glht(fit0,linfct = mcp(Level = "Tukey")))
For your next question, how to add a covariate, you will fit another model:
fit_full = lm(Rate ~ Level+P_Elderly,data=df)
And compare it against the model with only Level:
anova(f0,fit_full)
Analysis of Variance Table
Model 1: Rate ~ Level
Model 2: Rate ~ Level + P_Elderly
Res.Df RSS Df Sum of Sq F Pr(>F)
1 16 0.50688
2 15 0.50150 1 0.0053721 0.1607 0.6942
Against, doesn't seem like Elderly has much of an effect..