How to find balanced panel data in R (aka, how to find which entries in panel are complete over given window)

I have a big panel of data from Compustat. To it I am adding some hand-collected data (seriously hand-collected from a stack of old books). But I don't want to hand-collect for the entire panel, only a randomly selected subset. To find the larger set (from which I'm randomly selecting) I would like to start with the balanced panel from Compustat.

I see the plm library for working with unbalanced panels, but I would like to keep it balanced. Is there a clean way to do this short of searching for and throwing out firms (individuals in panelspeak) that don't run the sample period? Thanks!

Solution

After a second thought, there is a much easier way for doing this.

Look at this:

data.with.only.complete.subjects.data <- function(xx, subject.column, number.of.observation.a.subject.should.have)
{
    subjects <- xx[,subject.column]
    num.of.observations.per.subject <- table(subjects)
    subjects.to.keep <- names(num.of.observations.per.subject)[num.of.observations.per.subject == number.of.observation.a.subject.should.have]

    subset.by.me <- subjects %in%   subjects.to.keep

    new.xx <- xx[subset.by.me ,]

    return(new.xx)
}

xx <- data.frame(subject = rep(1:4, each = 3),
            observation.per.subject = rep(rep(1:3), 4))
xx.mis <- xx[-c(2,5),]

data.with.only.complete.subjects.data(xx.mis , 1, 3)