I am working with a spell dataset that has the following form:
clear all
input persid start end t_start t_end spell_type year spell_number event
1 8 9 44 45 1 1999 1 0
1 12 12 60 60 1 2000 1 0
1 1 1 61 61 1 2001 1 0
1 7 11 67 71 1 2001 2 0
1 1 4 85 88 2 2003 1 0
1 5 7 89 91 1 2003 2 1
1 8 11 92 95 2 2003 3 0
1 1 1 97 97 2 2004 1 0
1 1 3 121 123 1 2006 1 1
1 4 5 124 125 2 2006 2 0
1 6 9 126 129 1 2006 3 1
1 10 11 130 131 2 2006 4 0
1 12 12 132 132 1 2006 5 1
1 1 12 157 168 1 2009 1 0
1 1 12 169 180 1 2010 1 0
1 1 12 181 192 1 2011 1 0
1 1 12 193 204 1 2012 1 0
1 1 12 205 216 1 2013 1 0
end
lab define lab_spelltype 1 "unemployment spell" 2 "employment spell"
lab val spell_type lab_spelltype
where persid
is the id of the person; start
and end
are the months when the yearly unemployment/employment spell starts and ends, respectively; t_start
and t_end
are the same measures but starting to count from 1st January 1996; event
is equal to 1 for the employment entries for which the previous row was an unemployment spell.
The data is such that there are no overlapping spells during a given year, and each year contiguous spells of the same type have been merged together.
My goal is, for each row such that event
is 1, to compute the number of months spent as employed in the last 6 months and 24 months.
In this specific example, what I would like to get is:
clear all
input persid start end t_start t_end spell_type year spell_number event empl_6 empl_24
1 8 9 44 45 1 1999 1 0 . .
1 12 12 60 60 1 2000 1 0 . .
1 1 1 61 61 1 2001 1 0 . .
1 7 11 67 71 1 2001 2 0 . .
1 1 4 85 88 2 2003 1 0 . .
1 5 7 89 91 1 2003 2 1 0 5
1 8 11 92 95 2 2003 3 0 . .
1 1 1 97 97 2 2004 1 0 . .
1 1 3 121 123 1 2006 1 1 0 0
1 4 5 124 125 2 2006 2 0 . .
1 6 9 126 129 1 2006 3 1 3 3
1 10 11 130 131 2 2006 4 0 . .
1 12 12 132 132 1 2006 5 1 4 7
1 1 12 157 168 1 2009 1 0 . .
1 1 12 169 180 1 2010 1 0 . .
1 1 12 181 192 1 2011 1 0 . .
1 1 12 193 204 1 2012 1 0 . .
1 1 12 205 216 1 2013 1 0 . .
end
So the idea is that I have to go back to rows preceding each event==1
entry and count how many months the individual was employed.
Can you suggest a way to obtain this final result?
Some suggested to expand
the dataset, but perhaps there are better ways to tackle the problem (especially because the dataset is quite large).
EDIT
The correct labeling of the employment status is:
lab define lab_spelltype 1 "employment spell" 2 "unemployment spell"
The number of past months spent in employment (empl_6
and empl_24
) and the definition of event
are now correct with this label.
A solution to the problem is to:
tsfill
and finally,sum()
and lag operators to get the running sum for the last 6 and 24 months.See also Robert solution for some ideas I borrowed.
Important: this is almost surely not an efficient way to solve the issue, especially if the data is large (as in my case). However, the plus is that one actually "sees" what happens in background to make sure the final result is the one desired.
Also, importantly, this solution takes into account cases where 2 (or more) events happen within 6 (or 24) months from each other.
clear all
input persid start end t_start t_end spell_type year spell_number event
1 8 9 44 45 1 1999 1 0
1 12 12 60 60 1 2000 1 0
1 1 1 61 61 1 2001 1 0
1 7 11 67 71 1 2001 2 0
1 1 4 85 88 2 2003 1 0
1 5 7 89 91 1 2003 2 1
1 8 11 92 95 2 2003 3 0
1 1 1 97 97 2 2004 1 0
1 1 3 121 123 1 2006 1 1
1 4 5 124 125 2 2006 2 0
1 6 9 126 129 1 2006 3 1
1 10 11 130 131 2 2006 4 0
1 12 12 132 132 1 2006 5 1
1 1 12 157 168 1 2009 1 0
1 1 12 169 180 1 2010 1 0
1 1 12 181 192 1 2011 1 0
1 1 12 193 204 1 2012 1 0
1 1 12 205 216 1 2013 1 0
end
lab define lab_spelltype 1 "employment" 2 "unemployment"
lab val spell_type lab_spelltype
list
* generate Stata monthly dates
gen spell_start = ym(year,start)
gen spell_end = ym(year,end)
format %tm spell_start spell_end
list
* expand to monthly data
gen n = spell_end - spell_start + 1
expand n, gen(expanded)
sort persid year spell_number (expanded)
bysort persid year spell_number: gen month = spell_start + _n - 1
by persid year spell_number: replace event = 0 if _n > 1
format %tm month
* xtset, fill months gaps with "empty" rows, use lags and cumsum to count past months in employment
xtset persid month, monthly // %tm format
tsfill
bysort persid (month): gen cumsum = sum(spell_type) if spell_type==1
bysort persid (month): replace cumsum = cumsum[_n-1] if cumsum==.
bysort persid (month): gen m6 = cumsum-1 - L7.cumsum if event==1 // "-1" otherwise it sums also current empl month
bysort persid (month): gen m24 = cumsum-1 - L25.cumsum if event==1
drop if event==.
list persid start end year m* if event