Search code examples
panelsubsetstata

Drop all obs of group if condition is met


suppose I have the following panel data (didn't include time var for simplicity)

clear

input   id  var

        1   .      
        1   0      
        1   0      
        1   .     
        2   .     
        2   .     
        2   .     
        2   .     
        3   1    
        3   .     
        3   .
        3   0
end

I would like to delete all groups that have all missing data in their group, that is, I want my data to be like:

        id  var

        1   .      
        1   0      
        1   0      
        1   .      
        3   1    
        3   .     
        3   .
        3   0

I tried doing a gen todrop = var[_N], but for some reason, for some groups it doesn't work. Any thoughts? I thought about sorting id var, then doing a cascade replace, but I'm sure there is a better way to do this.


Solution

  • In general, you can verify whether all observations hold the same value by checking first and last observations in each panel, after appropriate sorting. The same principle applies here. I'll use the missing() function:

    clear
    set more off
    
    input   id  myvar
            1   .      
            1   0      
            1   0      
            1   .     
            2   .     
            2   .     
            2   .     
            2   .     
            3   1    
            3   .     
            3   .
            3   0
    end
    
    bysort id (myvar) : gen todrop = missing(myvar[1]) & missing(myvar[_N])
    
    list, sepby(id)
    

    In this case, just checking the first one also works. If it's missing, all others are.

    See help by.