Search code examples
statisticsstatasurveyt-test

Comparing means of two variables with the svy prefix in Stata (no ttest)


I am working with survey data and need to compare the means of a couple of variables. Since this is survey data, I need to apply survey weights, requiring the use of the svy prefix. This means that I cannot rely on Stata's ttest command. I essentially need to recreate the results of the following two ttest commands:

ttest bcg_vaccinated == chc_bcg_vaccinated_2, unpaired
ttest bcg_vaccinated == chc_bcg_vaccinated_2

bcg_vaccinated is a self-reported variable on BCG vaccination status while chc_bcg_vaccinated_2 is BCG vaccination status verified against a child health card. You will notice that chc_bcg_vaccinated_2 has missing values. These indicate that the child did not have a health card. So missing indicates no health card, 0 means the vaccination was not given, and finally, 1 means the vaccination was given. But this means that the variables have a different number of non-missing observations.

I have found the solution to the second ttest command, by creating a variable which is a difference between the two vaccination variables:

gen test_diff = bcg_vaccinated - chc_bcg_vaccinated_2 
regress test_diff

The above code runs only for the observations where both vaccination variables are non-missing, replicating the paired t-test listed above. Unfortunately, I cannot figure out how to do the first version. The first version would compare the means of both variables on the full set of observations.

Here are some example data for the two variables. Each row represents a different child.

clear
input byte bcg_vaccinated float chc_bcg_vaccinated_2
0 .
1 0
1 1
1 1
1 0
0 .
1 1
1 1
1 1
1 0
0 .
1 1
1 1
0 .
1 1
1 1
1 0
0 .
1 0
1 0
1 0
0 .
0 .
1 1
0 .

Solution

  • You need to get the data into a suitable form for a regression:

    . ttest bcg_vaccinated == chc_bcg_vaccinated_2, unpaired
    
    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
    bcg_va~d |      25         .68     .095219    .4760952    .4834775    .8765225
    chc_bc~2 |      17    .5882353    .1230382    .5072997    .3274059    .8490647
    ---------+--------------------------------------------------------------------
    Combined |      42    .6428571    .0748318    .4849656    .4917312    .7939831
    ---------+--------------------------------------------------------------------
        diff |            .0917647    .1536653               -.2188044    .4023338
    ------------------------------------------------------------------------------
        diff = mean(bcg_vaccinated) - mean(chc_bcg_vaccin~2)          t =   0.5972
    H0: diff = 0                                     Degrees of freedom =       40
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.7231         Pr(|T| > |t|) = 0.5538          Pr(T > t) = 0.2769
    
    . display r(p)
    .5537576
    
    . quietly stack bcg_vaccinated chc_bcg_vaccinated_2, into(vax_status) clear
    
    . quietly recode _stack (1 = 1 "SR") (2 = 0 "CHC"), gen(group) label(group)
    
    . regress vax_status i.group
    
          Source |       SS           df       MS      Number of obs   =        42
    -------------+----------------------------------   F(1, 40)        =      0.36
           Model |  .085210084         1  .085210084   Prob > F        =    0.5538
        Residual |  9.55764706        40  .238941176   R-squared       =    0.0088
    -------------+----------------------------------   Adj R-squared   =   -0.0159
           Total |  9.64285714        41  .235191638   Root MSE        =    .48882
    
    ------------------------------------------------------------------------------
      vax_status | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           group |
             SR  |   .0917647   .1536653     0.60   0.554    -.2188044    .4023338
           _cons |   .5882353   .1185553     4.96   0.000     .3486261    .8278445
    ------------------------------------------------------------------------------
    
    . testparm 1.group
    
     ( 1)  1.group = 0
    
           F(  1,    40) =    0.36
                Prob > F =    0.5538
    
    . display r(p)
    .5537576
    

    The testparm and display are not needed; they just show more digits.