Search code examples
statadata-cleaning

How to get non-consecutive entry and exit dates for datasets?


I have a unbalanced panel dataset from 2008 to 2018 of individuals in a labour force. Therefore, if individuals are working for a specific year, they will earn an income. It looks like this:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(idno year income)
1 2008 100
1 2009 100
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 100
1 2015 100
1 2016 100
1 2017 100
1 2018 100
2 2008 100
2 2009 100
2 2010 100
3 2009 100
3 2010 100
3 2015 100
3 2016 100
end

From this sample, we know that individual 1 (idno== 1) earns an income from 2008 to 2018; similarly, individual 2 (idno== 2) works during 2008-2010.

I would like to determine the year that an individual enters the labour force and leaves the labour force. Therefore, I tried the following:

I rectangularised the dataset (I use Stata 16):

fillin idno year

Then I determine whether individuals have worked in the dataset:

gen work = . 
replace work = 1 if income != .

Then I try to determine the start and end dates by individual (this would only work nicely for consecutive work periods)

bysort idno: gen years_earn_income = year if work == 1 
bysort idno: gen years_no_income = year if work == 0
bysort idno: gen start = min(years_earn_income)
bysort idno: gen end = max(years_earn_income)

I am struggling to get an appropriate entry and exit year for individuals that have multiple periods of employment. For example, individual 3 (idno== 3) works for 2009-2010 and 2015-2016. Therefore, I would want variables to reflect multiple employment periods as is the case for individual 3. I would appreciate any thoughts of how to think about this.


Solution

  • See https://www.stata-journal.com/article.html?article=dm0029 for a discussion of handling spells and tsspell from SSC for an implementation. Your example could be analysed like this:

     clear 
     input idno year income 
     1 2008 100
     1 2009 100
     1 2010 100
     1 2011 100
     1 2012 100
     1 2013 100
     1 2014 100
     1 2015 100
     1 2016 100
     1 2017 100
     1 2018 100
     2 2008 100
     2 2009 100
     2 2010 100
     3 2009 100
     3 2010 100
     3 2015 100
     3 2016 100
     end 
     
     tsset idno year 
     tsfill 
     
     ssc install tsspell 
     
     tsspell, pcond(income)
     
     list, sepby(idno _spell)
     
     list if _seq == 1 | _end, sepby(idno _spell)
    

    Here are the results:

    .  list, sepby(idno _spell)
    
         +---------------------------------------------+
         | idno   year   income   _seq   _spell   _end |
         |---------------------------------------------|
      1. |    1   2008      100      1        1      0 |
      2. |    1   2009      100      2        1      0 |
      3. |    1   2010      100      3        1      0 |
      4. |    1   2011      100      4        1      0 |
      5. |    1   2012      100      5        1      0 |
      6. |    1   2013      100      6        1      0 |
      7. |    1   2014      100      7        1      0 |
      8. |    1   2015      100      8        1      0 |
      9. |    1   2016      100      9        1      0 |
     10. |    1   2017      100     10        1      0 |
     11. |    1   2018      100     11        1      1 |
         |---------------------------------------------|
     12. |    2   2008      100      1        1      0 |
     13. |    2   2009      100      2        1      0 |
     14. |    2   2010      100      3        1      1 |
         |---------------------------------------------|
     15. |    3   2009      100      1        1      0 |
     16. |    3   2010      100      2        1      1 |
         |---------------------------------------------|
     17. |    3   2011        .      0        0      0 |
     18. |    3   2012        .      0        0      0 |
     19. |    3   2013        .      0        0      0 |
     20. |    3   2014        .      0        0      0 |
         |---------------------------------------------|
     21. |    3   2015      100      1        2      0 |
     22. |    3   2016      100      2        2      1 |
         +---------------------------------------------+
    
    
    .  list if _seq == 1 | _end, sepby(idno _spell)
    
         +---------------------------------------------+
         | idno   year   income   _seq   _spell   _end |
         |---------------------------------------------|
      1. |    1   2008      100      1        1      0 |
     11. |    1   2018      100     11        1      1 |
         |---------------------------------------------|
     12. |    2   2008      100      1        1      0 |
     14. |    2   2010      100      3        1      1 |
         |---------------------------------------------|
     15. |    3   2009      100      1        1      0 |
     16. |    3   2010      100      2        1      1 |
         |---------------------------------------------|
     21. |    3   2015      100      1        2      0 |
     22. |    3   2016      100      2        2      1 |
         +---------------------------------------------+