Search code examples
stataconfidence-intervaljitter

Is there a way to jitter confidence intervals in a scatterplot in Stata?


I am trying to create a plot that has a mean values over time, and associated confidence intervals (CIs), for two groups (foreign==0 and foreign==1) using twoway scatter and rcap. However, it is difficult to distinguish between the CIs when they overlap, even if I use different colours or line styles.

I tried to use the jitter option to randomly offset the points on the plot. While this works for scatter, it does not appear to work for rcap, which I am using to plot the CIs. It accepts the option, there is no error, but it does not jitter the CIs. I had hoped that by using the jitterseed option with the same seed, I would be able to offset both the marker from scatter and the CIs from rcap to the same location on the plot.

This post on Statalist from 2005 suggests that rcap does not support jitter. I can't find reference to the jitter option in the current rcap documentation, so I assume this is still the case. I am open to solutions that plot CIs around means using a plotting command other than rcap. TIA.

Reproducible example:

sysuse auto, clear

* Generate required statistics by repair record and foreign
collapse (mean)     mean_price = price      ///
         (sd)       sd_price = price        ///
         (count)    n_price = price , by(foreign rep78)

* Compute confidence intervals
gen lb_price = mean_price - invttail(n_price-1,0.025)*(sd_price / sqrt(n_price))
gen ub_price = mean_price + invttail(n_price-1,0.025)*(sd_price / sqrt(n_price))

* No jitter
twoway  (scatter mean_price rep78 if foreign == 0, ///
        c(L) lcol(black) msym(O) mcol(black) ) ///
        ///
    (scatter mean_price rep78 if foreign == 1, ///
        c(L) lcol(black) msym(O) mcol(black) ) ///  
        ///
    (rcap lb_price ub_price rep78 if foreign == 0, ///
    lcol(black) ) ///
        ///
    (rcap lb_price ub_price rep78 if foreign == 1, ///
    lcol(black) )

* With jitter
twoway  (scatter mean_price rep78 if foreign == 0, ///
            c(L) lcol(black) msym(O) mcol(black) jitter(10) jitterseed(123)) ///
            ///
        (scatter mean_price rep78 if foreign == 1, ///
            c(L) lcol(black) msym(O) mcol(black) jitter(10) jitterseed(456)) ///
            ///
        (rcap lb_price ub_price rep78 if foreign == 0, ///
        lcol(black) jitter(10) jitterseed(123)) ///
            ///
        (rcap lb_price ub_price rep78 if foreign == 1, ///
        lcol(black) jitter(10) jitterseed(456))

Solution

  • Thanks for the reproducible example. The post you cite from 2007 isn't making a suggestion that jitter() isn't allowed with twoway rcap: it's correctly stating that as a fact. But it is documented that jitter() applies to points using scatter or graph matrix and if it applied elsewhere that would be documented explicitly.

    But this isn't a cause for regret. Jittered spikes or capped bars for confidence intervals would just look a mess and they wouldn't be guaranteed to align with jittered markers for point estimates either. The graph would just look too much like a small child's drawing. Using the same seed wouldn't help even in principle.

    Stata doesn't support what you need directly within twoway graphics so some work setting up offsets is needed. For paired estimates and confidence intervals I displace displays left and right. In this example the offset is small enough and the range of the data is such that I get the x axis labels that make sense automatically, although in other problems you might need to spell out directly what you want.

    I dislike the parenthesis notation for different graph commands within the same command line, as there are quite enough parentheses already. Nor do I follow why you would want to insist on the same colour and marker symbol for two groups, but that may reflect some different logic or need of yours for your real problem.

    With say three groups to compare, I would shift one estimate left, one right and leave the third in the middle. With four or more groups you often need to think harder.

    The offset idea no doubt has occurred to many people as what is often done in literature, but it was written up in this Stata Journal tip by James Cui.

    sysuse auto, clear
    
    * Generate required statistics by repair record and foreign
    collapse (mean)     mean_price = price      ///
             (sd)       sd_price = price        ///
             (count)    n_price = price , by(foreign rep78)
    
    * Compute confidence intervals
    gen lb_price = mean_price - invttail(n_price-1,0.025)*(sd_price / sqrt(n_price))
    gen ub_price = mean_price + invttail(n_price-1,0.025)*(sd_price / sqrt(n_price))
    
    local offset 0.1 
    gen rep78_L = rep78 - `offset'
    gen rep78_R = rep78 + `offset'
    
    * No jitter
    twoway scatter mean_price rep78_L if foreign == 0, c(L) lcol(blue) msym(Oh) mcol(blue) ///
       || scatter mean_price rep78_R if foreign == 1, c(L) lcol(orange) msym(Th) mcol(orange)  ///  
       || rcap lb_price ub_price rep78_L if foreign == 0, lcol(blue)  ///
       || rcap lb_price ub_price rep78_R if foreign == 1, lcol(orange)  /// 
        xtitle("Repair record 1978") ytitle(Price (USD)) legend(order(1 "Domestic" 2 "Foreign"))
    

    enter image description here

    Something like this may be supported in community-contributed commands such as coefplot, a wonderful command I never use.