Search code examples
rfor-loopsampling

Working with a random sample within a for loop


I wish to draw 1,000 random samples of size 50 from the dataset and show that E(xi^ui) = 0 holds for each simulated sample. My code is below, and I have been trying to debug for some time now, but I can't figure out what's wrong.

The dataset is called 'dataset' and it has two columns: 'y' and 'x'. I want to regress y on x, get the residuals, and show that they are not correlated with x.

lm.fit <- NA
resid.lm.fit <- NA
correlation <- NA

for (i in 1:1000){
  samp1 <- sample(dataset, size=50, replace=T)
  lm.fit[i] <- lm(y ~ x, data=samp1)
  resid.lm.fit[i]<-resid(lm.fit[i])
  correlation[i] <- cor.test(resid.lm.fit[i],samp1$x)
}

The errors I am getting are:

Error in resid.lm.fit[i] <- resid(lm.fit[i]) : 
  replacement has length zero
In addition: Warning message:
In lm.fit[i] <- lm(y ~ x, data = samp1) :
  number of items to replace is not a multiple of replacement length

Solution

  • The problem is that you are trying to store a bunch of non-atomic objects in vectors. If you make lm.fit, resid.lm.fit, and correlation lists instead of vectors you will be okay:

    set.seed(123)
    dataset <- data.frame(
      x=1:250,
      y=3*(1:250)+rnorm(250,0,40))
    ##
    lm.fit <- list(NULL)
    resid.lm.fit <- list(NULL)
    correlation <- list(NULL)
    for (i in 1:1000){
      samp1 <- dataset[sample(1:nrow(dataset), size=50, replace=T),]
      lm.fit[[i]] <- lm(y ~ x, data=samp1)
      resid.lm.fit[[i]] <- resid(lm.fit[[i]])
      correlation[[i]] <- cor.test(resid.lm.fit[[i]],samp1$x)
    }
    

    Then you can check the results of individual cor.tests like this:

    > correlation[[1]]
    
        Pearson's product-moment correlation
    
    data:  resid.lm.fit[[i]] and samp1$x
    t = 0, df = 48, p-value = 1
    alternative hypothesis: true correlation is not equal to 0
    95 percent confidence interval:
     -0.2783477  0.2783477
    sample estimates:
             cor 
    2.991262e-17 
    

    Also, to sample from a data.frame use df[ sample(1:nrow(df),...), ], not sample(df,...).