Search code examples
sizestatasamplegraphing

Adding sample size to box plot


We measured antibody levels in different age groups, and our sample size for each group was different.

I would like to add the sample size of the respective group at the top of my box plot (i.e. sample size in toddler boys). The photo attached shows one of my bar graphs.

my code to create the bar graph:

graph box igg1ugml_log10, over(female_sex) over(age_groups) ///
bar(samplesize) graphregion(color(white)) /// 
title(Anti-EPEC IgG1 (ug/ml) in boys and girls) asyvars ///
ylabel(2.69897 "500" 3 "1,000" 3.3 "2000" 3.69 "5000" 3.95 "9000")

bar graph example


Solution

  • For adding text to graph box, use the documented text() option. Here is a reproducible example. Other than using the Graph Editor, I don't have any recipe for working out text position other than fiddling towards what looks good enough.

    sysuse auto, clear
    gen logprice = log10(price)
    ssc install mylabels 
    su price
    mylabels 3000(2000)15000, myscale(log10(@)) local(yla)
    
    graph box logprice, over(foreign) yla(`yla', ang(h)) ///
    text(4.25 21.2 "{it:n} = 52") text(4.25 79.8 "{it:n} = 22") ///
    ysc(r(. 4.3)) scheme(s1color) ytitle(Price (USD))
    

    enter image description here

    Note. To show the mu of microgram properly, see help graph text in Stata and search for Greek letters.

    EDIT

    stripplot from SSC can produce box plots too, although both its defaults and its possibilities differ from graph box. Here is a reproducible example.

    sysuse auto, clear
    egen count = count(mpg), by(rep78)
    
    gen where = 10.5 
    
    stripplot mpg , box vertical ms(none) pctile(5) over(rep78) ///
    yla(12 41 15(5)40, ang(h)) ///
    addplot(scatter where rep78, mla(count) ms(none) mlabpos(0) ///
    mlabsize(medsmall)) scheme(s1color)
    

    enter image description here

    Again, although this is reproducible code, the choice of 10.5 results from play with other values not shown here. You could try to automate a choice with a calculation based on the sample maximum and minimum and, naturally, your preference for where it should be. If you were producing dozens of these, that would be a good idea. For a single plot for a paper or presentation, I would just play.