Search code examples
stata

Stata: Keep the first observation by group


I have a data set that looks like this:

id  firm  earnings    A
1   A      100        0
1   A      200        0
2   B      50         1
2   B      70         1  
3   C      900        0

bys id firm, I want to keep only the first observation if A==0 and want to keep all the observations if A==1.

I've tried the following code:

if A==0{
bys id firm: keep if _n==1
 }

However, this code drops all the _n>1 observations no matter what the A value is.


Solution

  • The if (conditional) {do something} syntax is used in control flow rather than in defining variables. As you have it now Stata is only testing if A==1 in the first row. Try adding additional conditions using and (&) or or (|) statements. Try this:

    bys id firm: keep if (_n==1 & A==0) | A==1