How can I loop over rows of a data frame containing variable names permutations when running a mediation analysis with lavaan
?
Say I have 4 variables var1
, var2
, var3
, var4
:
df<- data.frame(var1 = rnorm(100),
var2 = rnorm(100),
var3 = rnorm(100),
var4 = rnorm(100))
Using gtools::permutations()
I save all possible permutations of the 4 variables in sets of 3:
permut <-
gtools::permutations(n = 4, r = 3, v = names(df), repeats.allowed = FALSE)
colnames(permut) <- c("Y", "X", "M")
> head(permut)
Y X M
[1,] "var1" "var2" "var3"
[2,] "var1" "var2" "var4"
[3,] "var1" "var3" "var2"
[4,] "var1" "var3" "var4"
[5,] "var1" "var4" "var2"
[6,] "var1" "var4" "var3"
Then I set the medation model using lavaan
syntax, where I'm interested in the mediating effect of M
over the relationship between X
and Y
:
mod <- "
M ~ a * X
Y ~ c * X + b * M
ind := a*b
tot := c + (a*b)
"
I want to run the model and store it's results for future inspection:
library(lavaan)
library(dplyr)
#fit the model
fit <- sem(mod, df, se = "robust")
#save results
result <-
parameterestimates(fit) %>% filter(op != "~~")
My question here is this:
Y,X,M
variable names from each row of permut
, fit the model using data from df
and model syntax from mod
and eventually store results for every model fit?The code above is the simplest possible scenario that I'd like to use for running more complex models the same way.
I'm aware of answers about looping linear models of different variables: loop over all possible combinations, Looping over combinations of regression model terms, Linear Regression loop for each independent variable individually against dependent, and possibly the closest one: How to use reference variables by character string in a formula?, but still I'm stuck and coudn't solve this over the weekend.
Here's one way of doing it:
fits <- apply(permut, 1, function (p) {
permuted.df <- df[p]
colnames(permuted.df) <- names(p)
sem(mod, permuted.df, se="robust")
})
fits
contains the SEM results for every 3-permutation in permut
. To see the estimates of, e.g., the first fit, you can proceed as usual:
> parameterestimates(fits[[1]]) %>% filter(op != "~~")
lhs op rhs label est se z pvalue ci.lower
1 M ~ X a -0.18393765 0.10977670 -1.6755618 0.09382406 -0.39909603
2 Y ~ X c 0.07314372 0.09891034 0.7394952 0.45960637 -0.12071699
3 Y ~ M b 0.01944518 0.08852450 0.2196587 0.82613697 -0.15405965
4 ind := a*b ind -0.00357670 0.01600038 -0.2235385 0.82311644 -0.03493686
5 tot := c+(a*b) tot 0.06956702 0.09816192 0.7086966 0.47851276 -0.12282680
ci.upper
1 0.03122074
2 0.26700443
3 0.19295001
4 0.02778346
5 0.26196084