I am working with a dataset that includes 118,979 observations over 9 wide variables in Stata 16.0. The most prominent variable is whether a company-observation over multiple dates reports either "GPS" or "EPS". These companies can report both a "GPS" observation in a datapoint, as well as an "EPS" observation in the following datapoint. Please refer to the data overview below for further visualisation.
Datasample:
clear
input str8 cusip8 str16 cname str4 measure double actual long anndats_act float(fyear tanalyst meanforcast UE)
"87482X10" "TALMER BANCORP" "EPS" 1.21 20118 2014 29 .8686207 .3930131
"87482X10" "TALMER BANCORP" "GPS" 1.02 20479 2015 34 .8576471 .1893004
I need to drop the GPS observations (over multiple dates) once an identifier (being cusip8 in the table above) has reported an EPS over multiple dates. That is, if a company has reported GPS as well as EPS in e.g. January 1st, 2010, I want to drop the GPS observation such that the EPS is kept. If a company only reports a GPS, and does not report an EPS during a given date, I want to keep the GPS observation in my dataset.
The following works for me (adjust your variable names as required):
. clear
. input str10(company_id measure) month day year
company_id measure month day year
1. "Company A" "EPS" 1 1 2010
2. "Company A" "GPS" 1 1 2010
3. "Company A" "GPS" 1 1 2010
4. "Company A" "GPS" 1 2 2010
5. "Company B" "EPS" 1 2 2010
6. "Company B" "GPS" 1 1 2010
7. "Company C" "GPS" 1 4 2010
8. "Company C" "EPS" 1 4 2010
9. end
.
. gen date = mdy(month,day,year)
. format date %d
. drop month day year
.
. sort company_id date measure
.
. gen both = 0
. by company_id date: replace both = 1 if measure[1] == "EPS" & measure[2] == "GPS"
(5 real changes made)
.
. list, sepby(company_id)
+----------------------------------------+
| company~d measure date both |
|----------------------------------------|
1. | Company A EPS 01jan2010 1 |
2. | Company A GPS 01jan2010 1 |
3. | Company A GPS 01jan2010 1 |
4. | Company A GPS 02jan2010 0 |
|----------------------------------------|
5. | Company B GPS 01jan2010 0 |
6. | Company B EPS 02jan2010 0 |
|----------------------------------------|
7. | Company C EPS 04jan2010 1 |
8. | Company C GPS 04jan2010 1 |
+----------------------------------------+
.
. drop if measure == "GPS" & both == 1
(3 observations deleted)
.
. list, sepby(company_id)
+----------------------------------------+
| company~d measure date both |
|----------------------------------------|
1. | Company A EPS 01jan2010 1 |
2. | Company A GPS 02jan2010 0 |
|----------------------------------------|
3. | Company B GPS 01jan2010 0 |
4. | Company B EPS 02jan2010 0 |
|----------------------------------------|
5. | Company C EPS 04jan2010 1 |
+----------------------------------------+