Search code examples
graphstata

Stata twoway graph of means with confidence intervals


Using

clear 
score      group     test
 2          0         A
 3          0         B
 6          0         B
 8          0         A
 2          0         A
 2          0         A
 10         1         B
 7          1         B
 8          1         A
 5          1         A
 10         1         A
 11         1         B
end

I want to scatter plot mean score by group for each test (same graph) with confidence intervals (the real data has thousands of observations). The resulting graph would have two sets of two dots. One set of dots for test==a (group==0 vs group==1) and one set of dots for test==b (group==0 vs group==1).

My current approach works but it is laborious. I compute all of the needed statistics using egen: the mean, number of observations, standard deviations...for each group by test. I then collapse the data and plot.

There has to be another way, no?

I assumed that Stata would be able to take as its input the score group and test variables and then compute and present this pretty standard graph.

After spending a lot of time on Google, I had to ask.


Solution

  • Although there are user-written programs, I lean towards statsby as a basic approach here. Discussion is accessible in this paper.

    This example takes your data example (almost executable code). Some choices depend on the large confidence intervals implied. Note that if your version of Stata is not up-to-date, the syntax of ci will be different. (Just omit means.)

    clear 
    input score      group    str1 test
     2          0         A
     3          0         B
     6          0         B
     8          0         A
     2          0         A
     2          0         A
     10         1         B
     7          1         B
     8          1         A
     5          1         A
     10         1         A
     11         1         B
    end
    save cj12 , replace 
    
    * test A 
    statsby mean=r(mean) ub=r(ub) lb=r(lb) N=r(N), by(group) clear : ///
    ci means score if test == "A"  
    gen test = "A" 
    save cj12results, replace 
    
    * test B 
    use cj12 
    statsby mean=r(mean) ub=r(ub) lb=r(lb) N=r(N), by(group) clear : ///
    ci means score if test == "B"  
    gen test = "B" 
    append using cj12results 
    
    * graph; show sample sizes too, but where to show them is empirical 
    set scheme s1color 
    gen where = -20 
    scatter mean group, ms(O) mcolor(blue) || ///
    rcap ub lb group, lcolor(blue) ///
    by(test, note("95% confidence intervals") legend(off))  ///
    subtitle(, fcolor(ltblue*0.2)) ///
    ytitle(score) xla(0 1) xsc(r(-0.25 1.25)) yla(-10(10)10, ang(h)) || ///
    scatter where group, ms(none) mla(N) mlabpos(12) mlabsize(*1.5) 
    

    enter image description here

    We can't compare your complete code or your graph, because you show neither.