Search code examples
stataline-plot

How to add a factor/group variable to line plot in Stata


I would like to have a line plot of a continuous variable over time using xtline and overlay a scatterplot or label for each data point indicating a group membership at this point.

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(id year group variable)
 101 2003 3 12
 102 2003 2 10
 102 2005 1 10
 102 2007 2 10
 102 2009 1 10
 102 2011 2 10
 103 2003 4  3
 103 2005 2  1
 104 2003 4 50
 105 2003 4  8
 105 2005 4 12
 105 2007 4 12
 105 2009 4 12
 106 2003 1  6
 106 2005 1 28
 106 2007 2 15
 106 2009 2  4
 106 2011 3  4
 106 2015 1  2
 106 2017 1  2
end

xtset id year

xtline variable, overlay

enter image description here

Here I added/marked/labelled groups of id 103.

enter image description here

I have four groups, which I hope can be shown in the legend as well.

Solutions

preserve
separate variable, by(id) veryshortlabel
line variable101-variable106 year  ///
|| scatter variable year,  ///
mla(group) ms(none) mlabc(black) ytitle(variable)
restore

Alternatively

xtline variable, overlay addplot(scatter variable year, mlabel(group))

enter image description here


Solution

  • I recommend direct labelling here. It is likely to yield a slightly messy graph, but your own example is already messy and will only get worse if you add more details.

    Here is a reproducible example.

    webuse grunfeld, clear
    set scheme s1color 
    separate invest, by(company) veryshortlabel
    
    line invest1-invest10 year , ysc(log)    ///
    || scatter invest year if year == 1954,  ///
    mla(company) ms(none) mlabc(black) legend(off) yla(1 10 100 1000, ang(h)) ytitle(investment)
    

    EDIT:

    In your example two identifiers are present only for single years. To show some technique for line plots with panel data, I focus on the others.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(id year group variable)
     101 2003 3 12
     102 2003 2 10
     102 2005 1 10
     102 2007 2 10
     102 2009 1 10
     102 2011 2 10
     103 2003 4  3
     103 2005 2  1
     104 2003 4 50
     105 2003 4  8
     105 2005 4 12
     105 2007 4 12
     105 2009 4 12
     106 2003 1  6
     106 2005 1 28
     106 2007 2 15
     106 2009 2  4
     106 2011 3  4
     106 2015 1  2
     106 2017 1  2
    end
    
    bysort id : gen include = _N > 1 
    ssc install fabplot 
    set scheme s1color 
    fabplot line variable year if include, xla(2003 " 2003" 2010 2017 "2017 ") by(id) frontopts(lw(thick)) xtitle("") 
    

    enter image description here