Search code examples
rregressionplyrlmsampling

R regression on multiple samples


I am using R

I have a panel dataset of ~5000 observations of 250 individuals over time.

I need to build a difference in difference regression, therefore I draw a random observation for each individual and I run a regression:

lm(x ~ x1 + x2 + ... , data = ddply(df,.(individual),function(x) x[sample(nrow(x),1),]))

over the resulting sample.

I need to compute the regression n times on n different random samples and compute the average of each estimator.

Is there a way to do this efficiently without manually computing and averaging n regressions?


Solution

  • Solved:

    I expected to find a specific package to do it but I built a function instead. For example, for n = 700

    fun <- function(alfa){
      alfa <-ddply(df,.(individual),function(x) x[sample(nrow(x),1),])
      beta <- lm(x ~ x1 + x2 + ... , data = alfa )$coefficients
      return(beta)
    }
    
    df.full <- replicate(700,fun(alfa))
    

    This way a dataset with 700 row is created, with the coefficient names as row. I can do even something like this:

    fun <- function(alfa){
      alfa <-ddply(df,.(individual),function(x) x[sample(nrow(x),1),])
      beta <- lm(x ~ x1 + x2 + ... , data = alfa)
      gamma <- summary(beta)[["coefficients"]][,1]
      return(gamma)
    
    }
    
    df.full <- replicate(700,fun(alfa))
    

    Changing [,1] with [,2] I will obtain the standard errors. After this, the means' computing follows directly.