Search code examples
statamissing-datapanel-data

Removing entire panel with missing values


I'm working on a panel dataset, which has missing values for four variables (at the start, end and in-between of panels). I would like to remove the entire panel which has missing values.

This is the code I have tried to use so far:

bysort BvD_ID YEAR: drop if sum(!missing(REV_LAY,EMP_LAY,FX_ASSET_LAY,MATCOST_LAY))==0

This piece of code successfully removes all observations with missing values in any of the four variables but it retains observations with non-missing values.

Example data:

  Firm_ID  Year  REV_LAY  EMP_LAY  FX_ASSET_LAY
  001      2001  80       25       120
  001      2002  75       .        122
  001      2003  82       32       128
  002      2001  40       15       45
  002      2002  42       18       48
  002      2003  45       20       50

In the above sample data, I want to drop panel Firm_ID = 001 completely.


Solution

  • You can do something like:

    clear
    input Firm_ID  Year  REV_LAY  EMP_LAY  FX_ASSET_LAY
      001      2001  80       25       120
      001      2002  75       .        122
      001      2003  82       32       128
      002      2001  40       15       45
      002      2002  42       18       48
      002      2003  45       20       50
    end
    
    generate index = _n
    bysort Firm_ID (index): generate todrop = sum(missing(REV_LAY, EMP_LAY, FX_ASSET_LAY))
    by Firm_ID: drop if todrop[_N]
    
    list Firm_ID Year REV_LAY EMP_LAY FX_ASSET_LAY
    
       +-----------------------------------------------+
       | Firm_ID   Year   REV_LAY   EMP_LAY   FX_ASS~Y |
       |-----------------------------------------------|
    1. |       2   2001        40        15         45 |
    2. |       2   2002        42        18         48 |
    3. |       2   2003        45        20         50 |
       +-----------------------------------------------+