I am working on some multiple regression problem that has different outcomes and different predictors.
I have a written a script that takes the outcomes and predictors and iterates over a function in a for loop.
What I am looking for is something like purrr::map
or any of its variants to replace the for loop
.
For illustrative purposes, I sharing some synthetic data that mimics the original data along with the script I have written.
# create synthetic data
df <- data.frame(y1=sample(rnorm(n=50, mean=3.25, sd=.25), replace=TRUE),
y2=sample(rnorm(n=50, mean=3.75, sd=.48), replace=TRUE),
x1=sample(rnorm(n=50, mean=4.28, sd=.32), replace=TRUE),
x2=sample(rnorm(n=50, mean=3.75, sd=.64), replace=TRUE),
x3=sample(rnorm(n=50, mean=3.99, sd=.55), replace=TRUE),
x4=sample(runif(n=50, min=1L, max=2L), replace=TRUE),
wgt=sample(runif(n=50, min=.20, max=.75), replace=TRUE))
# regression function
reg_func <- function(y, ...){
x = sapply(substitute(...()), deparse)
f = reformulate(termlabels=x, response=y)
model = eval(lm(f, data=df, weights=wgt, na.action=na.omit))
an0va = anova(model)
jt = jtools::summ(model, confint=TRUE, ci.width=0.95, robust=FALSE, vifs=TRUE)
list(outcome=y, model_summary=summary(model), a0nova=an0va, jtools_summumary=jt)
}
# select the dvs and set names
dvs <- names(df)[1:2]
dvs <- purrr::set_names(dvs)
I am looking for something that simplifies the looping part of my script below
# loop dependent variables and store the results
reg_out = list()
reg_out2 = list()
for (i in seq_along(dvs)){
reg_out[[i]] = reg_func(y=dvs[i], x1, x2)
reg_out2[[i]] = reg_func(y=dvs[i], x3, x4)
}
reg_out
reg_out2
Just use map()
on the dvs
vector itself and output a named list where you can access each independent variable and then the models you want.
library(purrr)
library(jtools)
reg_out <- map(
dvs,
~ list(
m1 = reg_func(y = .x, x1, x2),
m2 = reg_func(y = .x, x3, x4)
)
)
reg_out$y1$m1
#> $outcome
#> [1] "y1"
#>
#> $model_summary
#>
#> Call:
#> lm(formula = f, data = df, weights = wgt, na.action = na.omit)
#>
#> Weighted Residuals:
#> Min 1Q Median 3Q Max
#> -0.45116 -0.15721 0.03369 0.11011 0.42508
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.16364 0.73468 4.306 8.39e-05 ***
#> x1 0.04892 0.15780 0.310 0.758
#> x2 -0.02670 0.06838 -0.390 0.698
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.2102 on 47 degrees of freedom
#> Multiple R-squared: 0.00565, Adjusted R-squared: -0.03666
#> F-statistic: 0.1335 on 2 and 47 DF, p-value: 0.8753
#>
#>
#> $a0nova
#> Analysis of Variance Table
#>
#> Response: y1
#> Df Sum Sq Mean Sq F value Pr(>F)
#> x1 1 0.00506 0.005064 0.1146 0.7365
#> x2 1 0.00674 0.006736 0.1524 0.6980
#> Residuals 47 2.07680 0.044187
#>
#> $jtools_summumary
#> MODEL INFO:
#> Observations: 50
#> Dependent Variable: y1
#> Type: OLS linear regression
#>
#> MODEL FIT:
#> F(2,47) = 0.13, p = 0.88
#> R² = 0.01
#> Adj. R² = -0.04
#>
#> Standard errors: OLS
#> ----------------------------------------------------------------
#> Est. 2.5% 97.5% t val. p VIF
#> ----------------- ------- ------- ------- -------- ------ ------
#> (Intercept) 3.16 1.69 4.64 4.31 0.00
#> x1 0.05 -0.27 0.37 0.31 0.76 1.01
#> x2 -0.03 -0.16 0.11 -0.39 0.70 1.01
#> ----------------------------------------------------------------