I want to retain a copy of each company-year observation considering my subyear_total variable in my data.
Some of my data has multiple entries for any given year as noted by copies.
Copies was created by:
bysort cik year: gen copies = _N
How can I remove the duplicates but keep one copy of the unique observation?
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year long cik float(subyear_total copies)
1999 1750 425000 1
2005 1750 4232000 1
2006 1750 1.60e+07 1
2007 1750 182444 3
2007 1750 182444 3
2007 1750 182444 3
2008 1750 710909 3
2008 1750 710909 3
2008 1750 710909 3
2009 1750 5155390 5
2009 1750 5155390 5
2009 1750 5155390 5
2009 1750 5155390 5
2009 1750 5155390 5
end
So for example:
2007 has 3 entries and I want to keep one of those and drop the rest. Same thing for 2008 and 2009 (which has 5 entries).
I if do drop if copies > 1
would I lose all instances of those years? How can I keep at least one?
The duplicates
could be used here, but in your case
bysort year cik : keep if _n == 1
gets you there directly. The variable copies
is then of no obvious use.