Search code examples
graphstata

Confidence Interval Plots but original data doesn't have upper and lower bounds


I have a dataset in Stata that looks like the following:

math_score literacy_score student_gender Location
45.1 67.1 Boy Rural
32.6 45.2 Girl Urban
65.2 87.02 Girl Urban
34.2 66.7 Boy Rural
56.9 37.8 Boy Rural
79.3 45.8 Boy Urban

and so on. The actual dataset is encoded with Girl==0, Boy==1, Rural==0, and Urban==1

I want to generate bar graphs with mean values of literacy score by gender and by location. In addition to the bar graph, I also want the Confidence Intervals too but the data does not have lower and upper bound values.

I do have a code that works but it is very lengthy and generates additional datasets that generate clutter. Please let me know if there is an easier way to do this. Thanks

foreach x of math_score literacy_score {

        foreach var in Location student_gender {    
    
    preserve 
    

statsby mean_`x' =r(mean) upper=r(ub) lower=r(lb), by(`var') saving(`var', replace) : ci mean `x'


    restore 

        } 
 
    preserve 
 
        use student_gender, clear 
        append using Location

        gen xgroup = ceil(_n/2)
        label def xgroup 1 student_gender 2 Location 
        label val xgroup xgroup 
     
    gen xaxis = xgroup + _n - 1  
    label def xaxis 1 "`: label (student_gender) 0'", add 
    label def xaxis 2 "`: label (student_gender) 1'", add 
    label def xaxis 4 "`: label (Location) 0'", add 
    label def xaxis 5 "`: label (Location) 1'", add 

    label val xaxis xaxis 

    drop student_gender Location
    list 
    
    gen mylabel = string(mean_`x', "%9.1f")
    
    local title=""
    if `"`x'"' == "math_student_knowledge" loc title `"title(Math Student Knowledge Score (0-100))"'
    if `"`x'"' == "literacy_student_knowledge" loc title `"title(Literacy Student Knowledge Score (0-100))"'

    local ytitle=""
    if `"`x'"' == "math_student_knowledge" loc ytitle `"ytitle(Mean Math Knowledge Score)"'
    if `"`x'"' == "literacy_student_knowledge" loc ytitle `"ytitle(Mean Literacy Knowledge Score)"'


    twoway (bar mean_`x' xaxis, color(ltblue)) || scatter mean_`x' xaxis, msymbol(none) mlabel(mylabel) ///
    mlabposition(1) mlabs(small) || (rcap lower upper xaxis, color(black)) , xla(1/2 4/5 , valuelabel ///
    noticks labsize(small)) `ytitle' xmla("", tlc(none)) xsc(r(0 6)) xtitle("Student Features", size(medsmall)) legend(off) ///
    `title' ylab(0(25)100) name(`x'_2, replace) scheme(s2color8)
    
    graph export "$output/`x'_gen_loc.jpg" , as(jpg) width(4000) replace

restore

Solution

  • The problem is to show means and confidence intervals of two variables, each by two binary predictors. Your question shows that you know about most of the machinery. This shows some extra small tricks.

    foreach v in math write { 
        use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear
        statsby , by(female schtyp) saving(`v', replace) : ci means `v' 
    }
    
    use math 
    gen which = "Mathematics"
    append using write 
    replace which = "Writing" if missing(which)
    
    egen xaxis = seq(), to(4)
    label def xaxis 1 `" "male" "public" "' 2 `" "male" "private" "' 3 `" "female" "public" "' 4 `" "female" "private" "'
    label val xaxis xaxis 
    
    set scheme stcolor 
    
    twoway scatter mean xaxis, ms(D) || rcap ub lb xaxis, xla(1/4, valuelabel tlc(none)) xsc(r(0.5 4.5)) xtitle("") by(which, note("") legend(off)) ytitle(Score)
    

    enter image description here

    If someone is insisting on a bar chart, you can naturally change your syntax. But see (e.g.) this or this before you do that.