I am working with survey data and need to compare the means of a couple of variables. Since this is survey data, I need to apply survey weights, requiring the use of the svy prefix. This means that I cannot rely on Stata's ttest
command. I essentially need to recreate the results of the following two ttest
ttest bcg_vaccinated == chc_bcg_vaccinated_2, unpaired
ttest bcg_vaccinated == chc_bcg_vaccinated_2
is a self-reported variable on BCG vaccination status while chc_bcg_vaccinated_2
is BCG vaccination status verified against a child health card. You will notice that chc_bcg_vaccinated_2
has missing values. These indicate that the child did not have a health card. So missing indicates no health card, 0 means the vaccination was not given, and finally, 1 means the vaccination was given. But this means that the variables have a different number of non-missing observations.
I have found the solution to the second ttest
command, by creating a variable which is a difference between the two vaccination variables:
gen test_diff = bcg_vaccinated - chc_bcg_vaccinated_2
regress test_diff
The above code runs only for the observations where both vaccination variables are non-missing, replicating the paired t-test listed above. Unfortunately, I cannot figure out how to do the first version. The first version would compare the means of both variables on the full set of observations.
Here are some example data for the two variables. Each row represents a different child.
input byte bcg_vaccinated float chc_bcg_vaccinated_2
0 .
1 0
1 1
1 1
1 0
0 .
1 1
1 1
1 1
1 0
0 .
1 1
1 1
0 .
1 1
1 1
1 0
0 .
1 0
1 0
1 0
0 .
0 .
1 1
0 .
You need to get the data into a suitable form for a regression:
. ttest bcg_vaccinated == chc_bcg_vaccinated_2, unpaired
Two-sample t test with equal variances
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
bcg_va~d | 25 .68 .095219 .4760952 .4834775 .8765225
chc_bc~2 | 17 .5882353 .1230382 .5072997 .3274059 .8490647
Combined | 42 .6428571 .0748318 .4849656 .4917312 .7939831
diff | .0917647 .1536653 -.2188044 .4023338
diff = mean(bcg_vaccinated) - mean(chc_bcg_vaccin~2) t = 0.5972
H0: diff = 0 Degrees of freedom = 40
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.7231 Pr(|T| > |t|) = 0.5538 Pr(T > t) = 0.2769
. display r(p)
. quietly stack bcg_vaccinated chc_bcg_vaccinated_2, into(vax_status) clear
. quietly recode _stack (1 = 1 "SR") (2 = 0 "CHC"), gen(group) label(group)
. regress vax_status i.group
Source | SS df MS Number of obs = 42
-------------+---------------------------------- F(1, 40) = 0.36
Model | .085210084 1 .085210084 Prob > F = 0.5538
Residual | 9.55764706 40 .238941176 R-squared = 0.0088
-------------+---------------------------------- Adj R-squared = -0.0159
Total | 9.64285714 41 .235191638 Root MSE = .48882
vax_status | Coefficient Std. err. t P>|t| [95% conf. interval]
group |
SR | .0917647 .1536653 0.60 0.554 -.2188044 .4023338
_cons | .5882353 .1185553 4.96 0.000 .3486261 .8278445
. testparm 1.group
( 1) 1.group = 0
F( 1, 40) = 0.36
Prob > F = 0.5538
. display r(p)
The testparm
and display
are not needed; they just show more digits.