Search code examples
panelstata

Generate a lead variable in a panel


I want to generate a lead income variable V1 by CMA, Province and Year as follows:

CMA  Province  Year  Income  V1
TO      A      1990   $5     $8
TO      A      1991   $8     .
TO      A      1991   $8     .
MT      A      1990   $12    $7
MT      A      1991   $7     .
MT      A      1991   $7     .
AB      B      1990   $5     $15
AB      B      1991   $15    .
AB      B      1991   $15    .

I am using the following code, but I am getting only missing values:

gsort CMA Province -Year Income
gen ld_inc = Income[_n-1] if Year == Year[_n-1] + 1
replace ld_inc = ld_inc[_n-1] if Year == Year[_n-1] & missing(ld_inc)

Is there something wrong with my code and what I am getting wrong?


Solution

  • The following works for your example:

    clear
    
    input str2 CMA str2 Province Year Income V1
    "TO"  "A" 1990  5  8
    "."   "." 1991  8  .
    "MT"  "A" 1990 12  7
    "."   "." 1991  7  .
    "AB"  "B" 1990  5 15
    "."   "." 1990 15  .
    end
    
    generate V2 = Income[_n+1] if CMA != "."
    
    list, separator(0)
    
         +------------------------------------------+
         | CMA   Province   Year   Income   V1   V2 |
         |------------------------------------------|
      1. |  TO          A   1990        5    8    8 |
      2. |   .          .   1991        8    .    . |
      3. |  MT          A   1990       12    7    7 |
      4. |   .          .   1991        7    .    . |
      5. |  AB          B   1990        5   15   15 |
      6. |   .          .   1991       15    .    . |
         +------------------------------------------+
    

    However, normally it should be done using an identifier as follows:

    egen id = seq(), block(2)
    bysort id (Year Province): generate V3 = Income[_n+1]
    
    order id
    list, separator(0)
    
         +----------------------------------------------------+
         | id   CMA   Province   Year   Income   V1   V2   V3 |
         |----------------------------------------------------|
      1. |  1    TO          A   1990        5    8    8    8 |
      2. |  1     .          .   1991        8    .    .    . |
      3. |  2    MT          A   1990       12    7    7    7 |
      4. |  2     .          .   1991        7    .    .    . |
      5. |  3    AB          B   1990        5   15   15   15 |
      6. |  3     .          .   1991       15    .    .    . |
         +----------------------------------------------------+
    

    EDIT:

    Using your new example, things are even more straightforward:

    clear
    
    input str2 CMA  str1 Province  Year  Income  V1
    TO      A      1990   5     8
    TO      A      1991   8     .
    TO      A      1991   8     .
    MT      A      1990   12    7
    MT      A      1991   7     .
    MT      A      1991   7     .
    AB      B      1990   5     15
    AB      B      1991   15    .
    AB      B      1991   15    .
    end
    
    bysort CMA (Year Province): generate V2 = Income[_n+1] if _n == 1
    
    list, sepby(CMA)
    
         +------------------------------------------+
         | CMA   Province   Year   Income   V1   V2 |
         |------------------------------------------|
      1. |  AB          B   1990        5   15   15 |
      2. |  AB          B   1991       15    .    . |
      3. |  AB          B   1991       15    .    . |
         |------------------------------------------|
      4. |  MT          A   1990       12    7    7 |
      5. |  MT          A   1991        7    .    . |
      6. |  MT          A   1991        7    .    . |
         |------------------------------------------|
      7. |  TO          A   1990        5    8    8 |
      8. |  TO          A   1991        8    .    . |
      9. |  TO          A   1991        8    .    . |
         +------------------------------------------+