suppose I have the following panel data (didn't include time var for simplicity)
clear
input id var
1 .
1 0
1 0
1 .
2 .
2 .
2 .
2 .
3 1
3 .
3 .
3 0
end
I would like to delete all groups that have all missing data in their group, that is, I want my data to be like:
id var
1 .
1 0
1 0
1 .
3 1
3 .
3 .
3 0
I tried doing a gen todrop = var[_N]
, but for some reason, for some groups it doesn't work. Any thoughts? I thought about sort
ing id var
, then doing a cascade replace, but I'm sure there is a better way to do this.
In general, you can verify whether all observations hold the same value by checking first and last observations in each panel, after appropriate sort
ing. The same principle applies here. I'll use the missing()
function:
clear
set more off
input id myvar
1 .
1 0
1 0
1 .
2 .
2 .
2 .
2 .
3 1
3 .
3 .
3 0
end
bysort id (myvar) : gen todrop = missing(myvar[1]) & missing(myvar[_N])
list, sepby(id)
In this case, just checking the first one also works. If it's missing, all others are.
See help by
.