Search code examples
reconomics

How to find balanced panel data in R (aka, how to find which entries in panel are complete over given window)


I have a big panel of data from Compustat. To it I am adding some hand-collected data (seriously hand-collected from a stack of old books). But I don't want to hand-collect for the entire panel, only a randomly selected subset. To find the larger set (from which I'm randomly selecting) I would like to start with the balanced panel from Compustat.

I see the plm library for working with unbalanced panels, but I would like to keep it balanced. Is there a clean way to do this short of searching for and throwing out firms (individuals in panelspeak) that don't run the sample period? Thanks!


Solution

  • After a second thought, there is a much easier way for doing this.

    Look at this:

    data.with.only.complete.subjects.data <- function(xx, subject.column, number.of.observation.a.subject.should.have)
    {
        subjects <- xx[,subject.column]
        num.of.observations.per.subject <- table(subjects)
        subjects.to.keep <- names(num.of.observations.per.subject)[num.of.observations.per.subject == number.of.observation.a.subject.should.have]
    
        subset.by.me <- subjects %in%   subjects.to.keep
    
        new.xx <- xx[subset.by.me ,]
    
        return(new.xx)
    }
    
    xx <- data.frame(subject = rep(1:4, each = 3),
                observation.per.subject = rep(rep(1:3), 4))
    xx.mis <- xx[-c(2,5),]
    
    data.with.only.complete.subjects.data(xx.mis , 1, 3)