Search code examples
graphstatalegend-propertiesmultiple-axes

Stata: Order of legend items with multiple axes


I am trying to simultaneously graph two axes using twoway. One axis uses the variable route to replace the usual symbol using mlabel and msymbol(none). The other axis requires no specific label in the plot region, but ideally should be keyed in the legend as "Incr. Dose".

An undesired result arises when using the legend(order) specification to create the legend labels. The legend as coded has the symbol for the second axis as the 3rd ordered legend item. I would expect the code to place the symbol for the second axis as the 5th item. The problem can be fixed by adding a dummy category for an unused 5th route and moving the symbol by hand using the graph editor. I would like to know why the legend(order) option behaves this way, whether there is some interaction with mlabel or msymbol(none) and if a coding solution is available for use in repeated application.

*create data
clear 
set seed 42
set obs 50

gen cuml_dose = rnormal(0,1) *10 + 100
sort cuml_dose
gen interval = [_n] 
gen id = 1
gen incr_dose =0
replace incr_dose = cuml_dose[_n+1] - cuml_dose if [_n] > 1
gen route = rpoisson(1)
tab route,m

*create problem graph
sort interval
twoway  scatter cuml_dose interval, mlabel(route) msymbol(none) yaxis(1) || ///
    scatter incr_dose interval,  yaxis(2) ///
    legend(on) legend(order(0 "0=oral" 1 "1=IV" 2 "2=IM" 3 "3=patch"))

*partial solution
twoway  scatter cuml_dose interval, mlabel(route) msymbol(none) yaxis(1) || ///
    scatter incr_dose interval,  yaxis(2) ///
    legend(on) legend(order(0 "0=oral" 1 "1=IV" 2 "2=IM" 3 "3=patch" 4 "Incr. dose"))

Solution

  • Thanks for the self-contained example.

    Consider

    scatter cuml_dose interval, mlabel(route) ms(none) yaxis(1) /// 
    || scatter incr_dose interval, yaxis(2) ///
    legend(on order(- "0 = Oral" - "1 = IV" - "2 = IM" - "3 = patch" 2 "Incr. dose")) 
    

    In your last graph,

    1. cuml_dose is the first variable plotted: the fact that you use marker labels with several distinct values is irrelevant to the counting. So, it doesn't correspond to 0 to 3. I'm surprised that a reference to 0 is allowed in order(). Most crucially, marker labels are just text and graph doesn't care what the text is beyond showing it as instructed.

    2. incr_dose is the second variable plotted: hence use order(2 ... ).

    3. You can add to the legend arbitrarily with dash syntax, as above.

    4. I can't see that the use of two axes is problematic here at all to defining the legend. We're just counting what is plotted on any y axis.

    Bottom line: count what is plotted on any y axis in order of mention of variables in the syntax.