My goal is to generate a list of dataframes from a lavaan
model using simsem::sim
. Depending on if the dataOnly
argument is TRUE or FALSE, simsem::sim
will either generate a SimResult
object (which includes generated data plus parameter values, fit indices, etc.) or just a list of the generated dataframes.
If I impose any missingness on the generated dataframes, I am incapable of generating the full SimResult
object. I can either generate a SimResult
object with no missingness or I can generate a list of dataframes with missingness. Trying to generate the SimResult
object with missingness gives the below error message, but I couldn't find any argument for sem
, simsem::miss
, or simsem::sim
where I could specify to use the D2 pooling method. Any help would be appreciated.
#> Error in D3.LRT(object, h1 = h1, useImps = useImps, asymptotic = asymptotic, : D3 test statistic could not be calculated. Try the D2 pooling method.
set.seed(123)
suppressMessages(library(mice))
suppressMessages(library(lavaan))
suppressMessages(library(simsem))
data(mtcars)
model <- 'gear ~ carb'
fit <- lavaan::sem(model, data = mtcars)
make_missing <- simsem::miss(package = "mice", m = 2, maxit = 2, seed = 123)
thisworks <- simsem::sim(
nRep = 10,
model = fit,
n = 5)
#> Progress: 1 / 10
#> Progress: 2 / 10
#> Progress: 3 / 10
#> Progress: 4 / 10
#> Progress: 5 / 10
#> Progress: 6 / 10
#> Progress: 7 / 10
#> Progress: 8 / 10
#> Progress: 9 / 10
#> Progress: 10 / 10
class(thisworks)
#> [1] "SimResult"
#> attr(,"package")
#> [1] "simsem"
thisdoesnotwork <- simsem::sim(
nRep = 10,
model = fit,
n = 5,
miss = make_missing)
#> Progress: 1 / 10
#> Error in D3.LRT(object, h1 = h1, useImps = useImps, asymptotic = asymptotic, : D3 test statistic could not be calculated. Try the D2 pooling method.
thispartiallyworks <- simsem::sim(
nRep = 10,
model = fit,
n = 5,
dataOnly = TRUE,
miss = make_missing)
#> Progress: 1 / 10
#> Progress: 2 / 10
#> Progress: 3 / 10
#> Progress: 4 / 10
#> Progress: 5 / 10
#> Progress: 6 / 10
#> Progress: 7 / 10
#> Progress: 8 / 10
#> Progress: 9 / 10
#> Progress: 10 / 10
class(thispartiallyworks)
#> [1] "list"
My goal is to generate a list of dataframes from a
lavaan
model usingsimsem::sim
I'm confused, because neither the lavaan()
nor sim()
function returns a data.frame
object.
Depending on if the
dataOnly
argument ...
OK, so you do want the generated data (in a list), to which the model would be fit by lavaan()
within sim()
. In that case, the reason you are getting complete data is because you didn't choose any missing-data mechanism when you created the SimMissing
object make_missing
. Observe an alternative to your thispartiallyworks
example:
make_missing <- miss(package = "mice", m = 2, maxit = 2,
seed = 123, pmMCAR = .3)
set.seed(123)
incompleteList <- sim(nRep = 10, model = fit, n = 5,
dataOnly = TRUE, miss = make_missing)
incompleteList # list of data.frames with NAs
The thisdoesnotwork
example is still a problem, but now because mice()
detects collinearity (no surprise with only 5 observations, some of which are incomplete). Increasing the sample size to n=50
got me back to your original error, to which there is no solution implemented in simsem
(i.e., there is no way to pass arguments to semTools
methods for lavaan.mi
objects; see more on this below). But I suspect the error may be due to your model being saturated (df = 0). When I ran a larger model (with df > 0), it worked fine:
model <- ' mpg ~ carb + gear
carb + gear ~ cyl
'
fit <- lavaan::sem(model, data = mtcars)
thisworksnow <- simsem::sim(
nRep = 10,
model = fit,
n = 50,
miss = make_missing)
When I add the direct effect of cyl
on mpg
and the residual covariance between carb
and gear
, the model is once again saturated, so the pooling method does not work.
model <- ' mpg ~ carb + gear + cyl
carb + gear ~ cyl
carb ~~ gear
'
fit <- lavaan::sem(model, data = mtcars)
cannotwork <- simsem::sim(
nRep = 10,
model = fit,
n = 50,
miss = make_missing) # original error
Since sim
does not have a mechanism to "turn off" the collection of fitMeasures()
to store in the SimResult
object's @fit
slot, I suppose a solution would be to update semTools:::D3.LRT()
to return a pooled chisq
and df
= 0 for saturated models. I posted an issue on GitHub to get that done, so hopefully this error can be avoided in the future. You can watch for updates there: