Search code examples
rregressionlinear-regressionsample

Does bootstrapped sample outcome variable link up with x values in regression in R


I am trying to run a regression on a bootstrapped sample in R.

The original sample looks like this dataframe (referred to as df) and has hundreds of entries. Y is the outcome variable, and treat is 0 or 1.

y  treat
3  0
5  1
2  0
4  1

I have sampled with replacement to generate 900 observations from df$y.

set.seed(5)
b1 <- sample(df$y, 900, replace = TRUE, prob = NULL)

I have then run the following regression.

lm(b1 ~ treat, df)

When using the sample b1 as the outcome in the regression, does this automatically match up the correct value of b1 with the treat value from the original dataframe? If I want the outcome values in b1 to correspond to the correct treat value from the original dataframe, do I need to do something differently? How can I check that this is the regression I am trying to run?


Solution

  • We could sample on the sequence of rows instead of a single column. In the OP's code, it is just sampling the 'y', leaving the 'treat' with just 4 elements and when we apply the formula method, this would result in error as one of the object is having different length.

    lm(b1 ~ treat, df)   
    

    Error in model.frame.default(formula = b1 ~ treat, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'treat')

    Instead, we sample on the sequence of rows

    set.seed(5)
    df1 <- df[sample(seq_len(nrow(df)), 900, replace = TRUE),]
    lm(y ~ treat, df1)
    

    data

    df <- structure(list(y = c(3L, 5L, 2L, 4L), treat = c(0L, 1L, 0L, 1L
    )), class = "data.frame", row.names = c(NA, -4L))