I have a dataset in Stata that looks like the following:
math_score | literacy_score | student_gender | Location |
---|---|---|---|
45.1 | 67.1 | Boy | Rural |
32.6 | 45.2 | Girl | Urban |
65.2 | 87.02 | Girl | Urban |
34.2 | 66.7 | Boy | Rural |
56.9 | 37.8 | Boy | Rural |
79.3 | 45.8 | Boy | Urban |
and so on. The actual dataset is encoded with Girl==0, Boy==1, Rural==0, and Urban==1
I want to generate bar graphs with mean values of literacy score by gender and by location. In addition to the bar graph, I also want the Confidence Intervals too but the data does not have lower and upper bound values.
I do have a code that works but it is very lengthy and generates additional datasets that generate clutter. Please let me know if there is an easier way to do this. Thanks
foreach x of math_score literacy_score {
foreach var in Location student_gender {
preserve
statsby mean_`x' =r(mean) upper=r(ub) lower=r(lb), by(`var') saving(`var', replace) : ci mean `x'
restore
}
preserve
use student_gender, clear
append using Location
gen xgroup = ceil(_n/2)
label def xgroup 1 student_gender 2 Location
label val xgroup xgroup
gen xaxis = xgroup + _n - 1
label def xaxis 1 "`: label (student_gender) 0'", add
label def xaxis 2 "`: label (student_gender) 1'", add
label def xaxis 4 "`: label (Location) 0'", add
label def xaxis 5 "`: label (Location) 1'", add
label val xaxis xaxis
drop student_gender Location
list
gen mylabel = string(mean_`x', "%9.1f")
local title=""
if `"`x'"' == "math_student_knowledge" loc title `"title(Math Student Knowledge Score (0-100))"'
if `"`x'"' == "literacy_student_knowledge" loc title `"title(Literacy Student Knowledge Score (0-100))"'
local ytitle=""
if `"`x'"' == "math_student_knowledge" loc ytitle `"ytitle(Mean Math Knowledge Score)"'
if `"`x'"' == "literacy_student_knowledge" loc ytitle `"ytitle(Mean Literacy Knowledge Score)"'
twoway (bar mean_`x' xaxis, color(ltblue)) || scatter mean_`x' xaxis, msymbol(none) mlabel(mylabel) ///
mlabposition(1) mlabs(small) || (rcap lower upper xaxis, color(black)) , xla(1/2 4/5 , valuelabel ///
noticks labsize(small)) `ytitle' xmla("", tlc(none)) xsc(r(0 6)) xtitle("Student Features", size(medsmall)) legend(off) ///
`title' ylab(0(25)100) name(`x'_2, replace) scheme(s2color8)
graph export "$output/`x'_gen_loc.jpg" , as(jpg) width(4000) replace
restore
The problem is to show means and confidence intervals of two variables, each by two binary predictors. Your question shows that you know about most of the machinery. This shows some extra small tricks.
foreach v in math write {
use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear
statsby , by(female schtyp) saving(`v', replace) : ci means `v'
}
use math
gen which = "Mathematics"
append using write
replace which = "Writing" if missing(which)
egen xaxis = seq(), to(4)
label def xaxis 1 `" "male" "public" "' 2 `" "male" "private" "' 3 `" "female" "public" "' 4 `" "female" "private" "'
label val xaxis xaxis
set scheme stcolor
twoway scatter mean xaxis, ms(D) || rcap ub lb xaxis, xla(1/4, valuelabel tlc(none)) xsc(r(0.5 4.5)) xtitle("") by(which, note("") legend(off)) ytitle(Score)
If someone is insisting on a bar chart, you can naturally change your syntax. But see (e.g.) this or this before you do that.