Simplifying a for loop with purrr::map or any of its variants in R

I am working on some multiple regression problem that has different outcomes and different predictors.

I have a written a script that takes the outcomes and predictors and iterates over a function in a for loop.

What I am looking for is something like purrr::map or any of its variants to replace the for loop.

For illustrative purposes, I sharing some synthetic data that mimics the original data along with the script I have written.

# create synthetic data
df <- data.frame(y1=sample(rnorm(n=50, mean=3.25, sd=.25), replace=TRUE),
                 y2=sample(rnorm(n=50, mean=3.75, sd=.48), replace=TRUE),
                 x1=sample(rnorm(n=50, mean=4.28, sd=.32), replace=TRUE),
                 x2=sample(rnorm(n=50, mean=3.75, sd=.64), replace=TRUE),
                 x3=sample(rnorm(n=50, mean=3.99, sd=.55), replace=TRUE),
                 x4=sample(runif(n=50, min=1L, max=2L), replace=TRUE),
                 wgt=sample(runif(n=50, min=.20, max=.75), replace=TRUE))


# regression function
reg_func <- function(y, ...){
  x = sapply(substitute(...()), deparse)
  f = reformulate(termlabels=x, response=y)
  model = eval(lm(f, data=df, weights=wgt, na.action=na.omit))
  an0va = anova(model)
  jt = jtools::summ(model, confint=TRUE, ci.width=0.95, robust=FALSE, vifs=TRUE)
  list(outcome=y, model_summary=summary(model), a0nova=an0va, jtools_summumary=jt)
}

# select the dvs and set names
dvs <- names(df)[1:2]
dvs <- purrr::set_names(dvs)

I am looking for something that simplifies the looping part of my script below

# loop dependent variables and store the results
reg_out = list()
reg_out2 = list()
for (i in seq_along(dvs)){
  reg_out[[i]] = reg_func(y=dvs[i], x1, x2)
  reg_out2[[i]] = reg_func(y=dvs[i], x3, x4)
}
reg_out
reg_out2

Solution

Just use map() on the dvs vector itself and output a named list where you can access each independent variable and then the models you want.

library(purrr)
library(jtools)

reg_out <- map(
  dvs,
  ~ list(
    m1 = reg_func(y = .x, x1, x2),
    m2 = reg_func(y = .x, x3, x4)
  )
)

reg_out$y1$m1
#> $outcome
#> [1] "y1"
#> 
#> $model_summary
#> 
#> Call:
#> lm(formula = f, data = df, weights = wgt, na.action = na.omit)
#> 
#> Weighted Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.45116 -0.15721  0.03369  0.11011  0.42508 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  3.16364    0.73468   4.306 8.39e-05 ***
#> x1           0.04892    0.15780   0.310    0.758    
#> x2          -0.02670    0.06838  -0.390    0.698    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.2102 on 47 degrees of freedom
#> Multiple R-squared:  0.00565,    Adjusted R-squared:  -0.03666 
#> F-statistic: 0.1335 on 2 and 47 DF,  p-value: 0.8753
#> 
#> 
#> $a0nova
#> Analysis of Variance Table
#> 
#> Response: y1
#>           Df  Sum Sq  Mean Sq F value Pr(>F)
#> x1         1 0.00506 0.005064  0.1146 0.7365
#> x2         1 0.00674 0.006736  0.1524 0.6980
#> Residuals 47 2.07680 0.044187               
#> 
#> $jtools_summumary
#> MODEL INFO:
#> Observations: 50
#> Dependent Variable: y1
#> Type: OLS linear regression 
#> 
#> MODEL FIT:
#> F(2,47) = 0.13, p = 0.88
#> R² = 0.01
#> Adj. R² = -0.04 
#> 
#> Standard errors: OLS
#> ----------------------------------------------------------------
#>                      Est.    2.5%   97.5%   t val.      p    VIF
#> ----------------- ------- ------- ------- -------- ------ ------
#> (Intercept)          3.16    1.69    4.64     4.31   0.00       
#> x1                   0.05   -0.27    0.37     0.31   0.76   1.01
#> x2                  -0.03   -0.16    0.11    -0.39   0.70   1.01
#> ----------------------------------------------------------------