Search code examples
rregressionapply

r - How to fit a lm() function on a list of data sets?


After simulating 100,000 observations from DGP and splitting them to create a list of 1000 data frames with 100 observations each I would like to fit the the same equation to each data frame separately. I was wondering how to get separate coefficients for each data frame?

α <- 6
ß_1 <- 0.5
ß_2 <- 0.1
X_i <- rnorm(n = 100000, mean = 5, sd = 2)
X_i_squared <- X_i^2
e_i <- rnorm(n = 100000, mean = 0, sd = 1)
Y_i <- α + ß_1*X_i + ß_2*X_i^2 + e_i

df <- data.frame(Y_i, X_i, X_i_squared, e_i)

Splitted_df <- split(df, rep(1:1000, each = 100))

I used function split() to split the original data frame in list of 1000 new data frames and I am not sure how to proceed? Do I need to use some of the functions from apply family or? If anyone could help I would really appreciate it!


Solution

  • Using lapply you could create a list of models like so:

    mods <- lapply(Splitted_df, function(x) lm(Y_i ~ X_i + X_i_squared, data = x))
    

    And using purrr::map_df for convenience and broom::tidy you could get the coefficients as a dataframe like so:

    mods_tidy <- purrr::map_df(mods, broom::tidy, .id = "model")
    
    head(mods_tidy)
    #> # A tibble: 6 × 6
    #>   model term        estimate std.error statistic  p.value
    #>   <chr> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
    #> 1 1     (Intercept)   5.79      0.475      12.2  2.83e-21
    #> 2 1     X_i           0.591     0.170       3.48 7.51e- 4
    #> 3 1     X_i_squared   0.0942    0.0147      6.39 5.84e- 9
    #> 4 2     (Intercept)   6.38      0.521      12.3  2.07e-21
    #> 5 2     X_i           0.410     0.220       1.86 6.53e- 2
    #> 6 2     X_i_squared   0.107     0.0220      4.86 4.55e- 6