Search code examples
panelstatalongitudinal

Generation of a counter variable for episodes in panel data in stata


I am trying to generate a counter variable that describes the duration of a temporal episode in panel data. I am using long format data that looks something like this:

clear
input byte id int time byte var1 int aim1
1 1 0 .
1 2 0 .
1 3 1 1
1 4 1 2
1 5 0 .
1 6 0 .
1 7 0 .
2 1 0 .
2 2 1 1
2 3 1 2
2 4 1 3
2 5 0 . 
2 6 1 1
2 7 1 2
end

I want to generate a variable like aim1 that starts with a value of 1 when var1==1, and counts up one unit with each subsequent observation per ID where var1 is still equal to 1. For each observation where var1!=1, aim1 should contain missing values.

I already tried using rangestat (count) to solve the problem, however the created variable does not restart the count with each episode:

ssc install rangestat
gen var2=1 if var1==1
rangestat (count) aim2=var2, interval(time -7 0) by (id)

Solution

  • Here are two ways to do it: (1) from first principles, but see this paper for more and (2) using tsspell from SSC.

    clear
    input byte id int time byte var1 int aim1
    1 1 0 .
    1 2 0 .
    1 3 1 1
    1 4 1 2
    1 5 0 .
    1 6 0 .
    1 7 0 .
    2 1 0 .
    2 2 1 1
    2 3 1 2
    2 4 1 3
    2 5 0 . 
    2 6 1 1
    2 7 1 2
    end
    
    bysort id (time) : gen wanted = 1 if var1 == 1 & var1[_n-1] != 1 
    by id: replace wanted = wanted[_n-1] + 1 if var1 == 1 & missing(wanted)
    
    tsset id time
    ssc inst tsspell 
    
    tsspell, cond(var1 == 1)
    
    list, sepby(id _spell)
    
         +---------------------------------------------------------+
         | id   time   var1   aim1   wanted   _seq   _spell   _end |
         |---------------------------------------------------------|
      1. |  1      1      0      .        .      0        0      0 |
      2. |  1      2      0      .        .      0        0      0 |
         |---------------------------------------------------------|
      3. |  1      3      1      1        1      1        1      0 |
      4. |  1      4      1      2        2      2        1      1 |
         |---------------------------------------------------------|
      5. |  1      5      0      .        .      0        0      0 |
      6. |  1      6      0      .        .      0        0      0 |
      7. |  1      7      0      .        .      0        0      0 |
         |---------------------------------------------------------|
      8. |  2      1      0      .        .      0        0      0 |
         |---------------------------------------------------------|
      9. |  2      2      1      1        1      1        1      0 |
     10. |  2      3      1      2        2      2        1      0 |
     11. |  2      4      1      3        3      3        1      1 |
         |---------------------------------------------------------|
     12. |  2      5      0      .        .      0        0      0 |
         |---------------------------------------------------------|
     13. |  2      6      1      1        1      1        2      0 |
     14. |  2      7      1      2        2      2        2      1 |
         +---------------------------------------------------------+
    

    The approach of tsspell is very close to what you ask for, except (a) its counter (by default _seq is 0 when out of spell, but replace _seq = . if _seq == 0 gets what you ask (b) its auxiliary variables (by default _spell and _end) are useful in many problems. You must install tsspell before you can use it with ssc install tsspell.