Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

It might not be very clear from the title but what I wish to do is:

I have a dataframe df with, say, 200 columns and the first 80 columns are response variables (y1, y2, y3, ...) and the rest of 120 are predictors (x1, x2, x3, ...).
I wish to compute a linear model for each pair – lm(yi ~ xi, data = df).
Many problems and solutions I have looked through online have a either a fixed response vs many predictors or the other way around, using lapply() and its related functions.

Could anyone who is familiar with it point me to the right step?

Solution

use tidyverse

library(tidyverse)
library(broom)
df <- mtcars
y <- names(df)[1:3]
x <- names(df)[4:7]

result <- expand_grid(x, y) %>% 
  rowwise() %>% 
  mutate(frm = list(reformulate(x, y)),
         model = list(lm(frm, data = df))) 

result$model <- purrr::set_names(result$model, nm = paste0(result$y, " ~ ", result$x))

result$model[1:2]
#> $`mpg ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823  
#> 
#> 
#> $`cyl ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>     3.00680      0.02168

map_df(result$model, tidy)
#> # A tibble: 24 x 5
#>    term        estimate std.error statistic  p.value
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 (Intercept)  30.1      1.63       18.4   6.64e-18
#>  2 hp           -0.0682   0.0101     -6.74  1.79e- 7
#>  3 (Intercept)   3.01     0.425       7.07  7.41e- 8
#>  4 hp            0.0217   0.00264     8.23  3.48e- 9
#>  5 (Intercept)  21.0     32.6         0.644 5.25e- 1
#>  6 hp            1.43     0.202       7.08  7.14e- 8
#>  7 (Intercept)  -7.52     5.48       -1.37  1.80e- 1
#>  8 drat          7.68     1.51        5.10  1.78e- 5
#>  9 (Intercept)  14.6      1.58        9.22  2.93e-10
#> 10 drat         -2.34     0.436      -5.37  8.24e- 6
#> # ... with 14 more rows

map_df(result$model, glance)
#> # A tibble: 12 x 12
#>    r.squared adj.r.squared  sigma statistic  p.value    df logLik   AIC   BIC
#>        <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#>  1     0.602         0.589   3.86     45.5  1.79e- 7     1  -87.6 181.  186. 
#>  2     0.693         0.683   1.01     67.7  3.48e- 9     1  -44.6  95.1  99.5
#>  3     0.626         0.613  77.1      50.1  7.14e- 8     1 -183.  373.  377. 
#>  4     0.464         0.446   4.49     26.0  1.78e- 5     1  -92.4 191.  195. 
#>  5     0.490         0.473   1.30     28.8  8.24e- 6     1  -52.7 111.  116. 
#>  6     0.504         0.488  88.7      30.5  5.28e- 6     1 -188.  382.  386. 
#>  7     0.753         0.745   3.05     91.4  1.29e-10     1  -80.0 166.  170. 
#>  8     0.612         0.599   1.13     47.4  1.22e- 7     1  -48.3 103.  107. 
#>  9     0.789         0.781  57.9     112.   1.22e-11     1 -174.  355.  359. 
#> 10     0.175         0.148   5.56      6.38 1.71e- 2     1  -99.3 205.  209. 
#> 11     0.350         0.328   1.46     16.1  3.66e- 4     1  -56.6 119.  124. 
#> 12     0.188         0.161 114.        6.95 1.31e- 2     1 -196.  398.  402. 
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

^{Created on 2020-12-11 by the reprex package (v0.3.0)}