Search code examples
stataboxplot

Creating box plots of the gap between two groups by deciles


I work with Stata and I have math grades for two different groups: A and B.

I want to see the gap that exists between both groups in each decile. In addition I want to do a box plot of this gap for each decile (I want to have 10 box plots, one for each decile which shows the gap between group grades).

What I first did was to compute the deciles using xtile for both groups:

xtile decileA= mat if group==1, nq(10)

xtile decileB= mat if group==0, nq(10)

However, groups A and B do not have the same number of observations nor the same distribution. I thought of computing quantiles for each decile and group and subtracting them to get the difference in each decile at each quartile to create the boxplot. But I do not know how to proceed afterwards to create the graph, and given that I have a different number of observations in each group decile I do not know if it is correct to proceed this way.

If I try to use the pctile command and compute the difference at each decile, I lose all the variance in the data inside each decile. I only get median differences and not all the quantiles I want.

Example:

pctile decileA= mat if group==1, nq(10)

pctile decileB= mat if group==0, nq(10)

gen qdiff= decileA- decileB if _n<10

gen qtau=_n/10 if _n<10 

graph box qdiff, over(tau)

I want to know if there is a way to do the graph I am intending to?

Cross-posted on Statalist.


Solution

  • There is certainly a way to accomplish what you want with a bit of effort, but if the goal is to make a comparison between the two groups at each decile with some notion of variability, you can easily get that from a simultaneous quantile regression and the SEs that it produces:

    sysuse auto, clear
    
    sqreg price i.foreign, quantile(.1 .2 .3 .4 .5 .6 .7 .8 .9)
    
    margins, dydx(foreign) ///
    predict(outcome(q10))  ///
    predict(outcome(q20))  ///
    predict(outcome(q30))  ///
    predict(outcome(q40))  ///
    predict(outcome(q50))  ///
    predict(outcome(q60))  ///
    predict(outcome(q70))  ///
    predict(outcome(q80))  ///
    predict(outcome(q90))  ///
    post
    
    marginsplot, yline(0) xlab(, grid) ylab(#10, grid angle(90))
    

    This yields a graph showing that foreign origin is associated with a bigger price at higher deciles, with the exception of the top decile, though none of the differences are probably significant here given how much the CIs overlap:

    enter image description here

    You can even conduct formal hypothesis tests that the effects are equal like this:

    . test _b[1.foreign:9._predict] =  _b[1.foreign:8._predict]
    
     ( 1)  - [1.foreign]8._predict + [1.foreign]9._predict = 0
    
               chi2(  1) =    3.72
             Prob > chi2 =    0.0537
    

    With 74 cars, we cannot reject that the effect on the 80th and 90th percentile are the same even though the point estimates have the opposite signs but similar magnitude.