Search code examples
statapanel-data

Trimming my panel dataset - filtering out observations meeting criterion if preceding ID meets the complementary criterion


I am working with a dataset that includes 118,979 observations over 9 wide variables in Stata 16.0. The most prominent variable is whether a company-observation over multiple dates reports either "GPS" or "EPS". These companies can report both a "GPS" observation in a datapoint, as well as an "EPS" observation in the following datapoint. Please refer to the data overview below for further visualisation.

Datasample:

clear
input str8 cusip8 str16 cname str4 measure double actual long anndats_act float(fyear tanalyst meanforcast UE)
"87482X10" "TALMER BANCORP"   "EPS"   1.21 20118 2014  29   .8686207     .3930131
"87482X10" "TALMER BANCORP"   "GPS"   1.02 20479 2015  34   .8576471     .1893004

I need to drop the GPS observations (over multiple dates) once an identifier (being cusip8 in the table above) has reported an EPS over multiple dates. That is, if a company has reported GPS as well as EPS in e.g. January 1st, 2010, I want to drop the GPS observation such that the EPS is kept. If a company only reports a GPS, and does not report an EPS during a given date, I want to keep the GPS observation in my dataset.


Solution

  • The following works for me (adjust your variable names as required):

    . clear
    
    . input str10(company_id measure) month day year
    
         company_id measure month day year
      1. "Company A" "EPS" 1 1 2010
      2. "Company A" "GPS" 1 1 2010 
      3. "Company A" "GPS" 1 1 2010
      4. "Company A" "GPS" 1 2 2010
      5. "Company B" "EPS" 1 2 2010
      6. "Company B" "GPS" 1 1 2010
      7. "Company C" "GPS" 1 4 2010
      8. "Company C" "EPS" 1 4 2010
      9. end
    
    . 
    . gen date = mdy(month,day,year)
    
    . format date %d
    
    . drop month day year
    
    . 
    . sort company_id date measure
    
    . 
    . gen both = 0
    
    . by company_id date: replace both = 1 if measure[1] == "EPS" & measure[2] == "GPS"
    (5 real changes made)
    
    . 
    . list, sepby(company_id)
    
         +----------------------------------------+
         | company~d   measure        date   both |
         |----------------------------------------|
      1. | Company A       EPS   01jan2010      1 |
      2. | Company A       GPS   01jan2010      1 |
      3. | Company A       GPS   01jan2010      1 |
      4. | Company A       GPS   02jan2010      0 |
         |----------------------------------------|
      5. | Company B       GPS   01jan2010      0 |
      6. | Company B       EPS   02jan2010      0 |
         |----------------------------------------|
      7. | Company C       EPS   04jan2010      1 |
      8. | Company C       GPS   04jan2010      1 |
         +----------------------------------------+
    
    . 
    . drop if measure == "GPS" & both == 1
    (3 observations deleted)
    
    . 
    . list, sepby(company_id)
    
         +----------------------------------------+
         | company~d   measure        date   both |
         |----------------------------------------|
      1. | Company A       EPS   01jan2010      1 |
      2. | Company A       GPS   02jan2010      0 |
         |----------------------------------------|
      3. | Company B       GPS   01jan2010      0 |
      4. | Company B       EPS   02jan2010      0 |
         |----------------------------------------|
      5. | Company C       EPS   04jan2010      1 |
         +----------------------------------------+