I have a unbalanced panel dataset from 2008 to 2018 of individuals in a labour force. Therefore, if individuals are working for a specific year, they will earn an income. It looks like this:
* Example generated by -dataex-. For more info, type help dataex
clear
input float(idno year income)
1 2008 100
1 2009 100
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 100
1 2015 100
1 2016 100
1 2017 100
1 2018 100
2 2008 100
2 2009 100
2 2010 100
3 2009 100
3 2010 100
3 2015 100
3 2016 100
end
From this sample, we know that individual 1 (idno
== 1) earns an income from 2008 to 2018; similarly, individual 2 (idno
== 2) works during 2008-2010.
I would like to determine the year that an individual enters the labour force and leaves the labour force. Therefore, I tried the following:
I rectangularised the dataset (I use Stata 16):
fillin idno year
Then I determine whether individuals have worked in the dataset:
gen work = .
replace work = 1 if income != .
Then I try to determine the start and end dates by individual (this would only work nicely for consecutive work periods)
bysort idno: gen years_earn_income = year if work == 1
bysort idno: gen years_no_income = year if work == 0
bysort idno: gen start = min(years_earn_income)
bysort idno: gen end = max(years_earn_income)
I am struggling to get an appropriate entry and exit year for individuals that have multiple periods of employment. For example, individual 3 (idno
== 3) works for 2009-2010 and 2015-2016. Therefore, I would want variables to reflect multiple employment periods as is the case for individual 3. I would appreciate any thoughts of how to think about this.
See https://www.stata-journal.com/article.html?article=dm0029 for a discussion of handling spells and tsspell
from SSC for an implementation. Your example could be analysed like this:
clear
input idno year income
1 2008 100
1 2009 100
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 100
1 2015 100
1 2016 100
1 2017 100
1 2018 100
2 2008 100
2 2009 100
2 2010 100
3 2009 100
3 2010 100
3 2015 100
3 2016 100
end
tsset idno year
tsfill
ssc install tsspell
tsspell, pcond(income)
list, sepby(idno _spell)
list if _seq == 1 | _end, sepby(idno _spell)
Here are the results:
. list, sepby(idno _spell)
+---------------------------------------------+
| idno year income _seq _spell _end |
|---------------------------------------------|
1. | 1 2008 100 1 1 0 |
2. | 1 2009 100 2 1 0 |
3. | 1 2010 100 3 1 0 |
4. | 1 2011 100 4 1 0 |
5. | 1 2012 100 5 1 0 |
6. | 1 2013 100 6 1 0 |
7. | 1 2014 100 7 1 0 |
8. | 1 2015 100 8 1 0 |
9. | 1 2016 100 9 1 0 |
10. | 1 2017 100 10 1 0 |
11. | 1 2018 100 11 1 1 |
|---------------------------------------------|
12. | 2 2008 100 1 1 0 |
13. | 2 2009 100 2 1 0 |
14. | 2 2010 100 3 1 1 |
|---------------------------------------------|
15. | 3 2009 100 1 1 0 |
16. | 3 2010 100 2 1 1 |
|---------------------------------------------|
17. | 3 2011 . 0 0 0 |
18. | 3 2012 . 0 0 0 |
19. | 3 2013 . 0 0 0 |
20. | 3 2014 . 0 0 0 |
|---------------------------------------------|
21. | 3 2015 100 1 2 0 |
22. | 3 2016 100 2 2 1 |
+---------------------------------------------+
. list if _seq == 1 | _end, sepby(idno _spell)
+---------------------------------------------+
| idno year income _seq _spell _end |
|---------------------------------------------|
1. | 1 2008 100 1 1 0 |
11. | 1 2018 100 11 1 1 |
|---------------------------------------------|
12. | 2 2008 100 1 1 0 |
14. | 2 2010 100 3 1 1 |
|---------------------------------------------|
15. | 3 2009 100 1 1 0 |
16. | 3 2010 100 2 1 1 |
|---------------------------------------------|
21. | 3 2015 100 1 2 0 |
22. | 3 2016 100 2 2 1 |
+---------------------------------------------+