Search code examples
stata

Keep firms with four consecutive years


I am using ExecuComp/Compustat data that look something like this:

  gvkey|Year |Ticker  |
-------|-----|--------|
 001111| 2006| abc    |
 001111| 2007| abc    |
 001111| 2008| abc    |
 001111| 2009| abc    |
 001112| 2006| www    |
 001112| 2009| www    |
 001113| 2008| ttt    |
 001114| 2010| vvv    |

How can I only keep firms that have any 4 consecutive years of observations?

I tried to read the FAQ and related forum questions, but I cannot seem to get my head around it.


Solution

  • The following works for me:

    clear
    
    input gvkey Year str3 Ticker
    001111 2006 abc     
    001111 2007 abc     
    001111 2008 abc     
    001111 2009 abc     
    001112 2006 www     
    001112 2009 www     
    001113 2008 ttt     
    001114 2010 vvv  
    end
    
    tsset gvkey Year
    
    tsspell, f(L.Year == .) 
    
    egen tag = max(_seq), by(gvkey _spell)
    
    keep if tag == 4
    
    list
    
         +----------------------------------------------------+
         | gvkey   Year   Ticker   _spell   _seq   _end   tag |
         |----------------------------------------------------|
      1. |  1111   2006      abc        1      1      0     4 |
      2. |  1111   2007      abc        1      2      0     4 |
      3. |  1111   2008      abc        1      3      0     4 |
      4. |  1111   2009      abc        1      4      1     4 |
         +----------------------------------------------------+