Search code examples
rmontecarlor-micer-lavaan

Monte Carlo with simsem and lavaan: "Error in D3.LRT ... Try the D2 pooling method."


My goal is to generate a list of dataframes from a lavaan model using simsem::sim. Depending on if the dataOnly argument is TRUE or FALSE, simsem::sim will either generate a SimResult object (which includes generated data plus parameter values, fit indices, etc.) or just a list of the generated dataframes.

If I impose any missingness on the generated dataframes, I am incapable of generating the full SimResult object. I can either generate a SimResult object with no missingness or I can generate a list of dataframes with missingness. Trying to generate the SimResult object with missingness gives the below error message, but I couldn't find any argument for sem, simsem::miss, or simsem::sim where I could specify to use the D2 pooling method. Any help would be appreciated.

#> Error in D3.LRT(object, h1 = h1, useImps = useImps, asymptotic = asymptotic, : D3 test statistic could not be calculated. Try the D2 pooling method.

set.seed(123)
suppressMessages(library(mice))
suppressMessages(library(lavaan))
suppressMessages(library(simsem))

data(mtcars)
model <- 'gear ~ carb'
fit <- lavaan::sem(model, data = mtcars)
make_missing <- simsem::miss(package = "mice", m = 2, maxit = 2, seed = 123)

thisworks <- simsem::sim(
  nRep = 10,
  model = fit,
  n = 5)
#> Progress: 1 / 10 
#> Progress: 2 / 10 
#> Progress: 3 / 10 
#> Progress: 4 / 10 
#> Progress: 5 / 10 
#> Progress: 6 / 10 
#> Progress: 7 / 10 
#> Progress: 8 / 10 
#> Progress: 9 / 10 
#> Progress: 10 / 10
class(thisworks)
#> [1] "SimResult"
#> attr(,"package")
#> [1] "simsem"

thisdoesnotwork <- simsem::sim(
  nRep = 10,
  model = fit,
  n = 5,
  miss = make_missing)
#> Progress: 1 / 10
#> Error in D3.LRT(object, h1 = h1, useImps = useImps, asymptotic = asymptotic, : D3 test statistic could not be calculated. Try the D2 pooling method.

thispartiallyworks <- simsem::sim(
  nRep = 10,
  model = fit,
  n = 5,
  dataOnly = TRUE,
  miss = make_missing)
#> Progress: 1 / 10 
#> Progress: 2 / 10 
#> Progress: 3 / 10 
#> Progress: 4 / 10 
#> Progress: 5 / 10 
#> Progress: 6 / 10 
#> Progress: 7 / 10 
#> Progress: 8 / 10 
#> Progress: 9 / 10 
#> Progress: 10 / 10
class(thispartiallyworks)
#> [1] "list"

Solution

  • My goal is to generate a list of dataframes from a lavaan model using simsem::sim

    I'm confused, because neither the lavaan() nor sim() function returns a data.frame object.

    Depending on if the dataOnly argument ...

    OK, so you do want the generated data (in a list), to which the model would be fit by lavaan() within sim(). In that case, the reason you are getting complete data is because you didn't choose any missing-data mechanism when you created the SimMissing object make_missing. Observe an alternative to your thispartiallyworks example:

    make_missing <- miss(package = "mice", m = 2, maxit = 2,
                         seed = 123, pmMCAR = .3)
    set.seed(123)
    incompleteList <- sim(nRep = 10, model = fit, n = 5,
                          dataOnly = TRUE, miss = make_missing)
    incompleteList # list of data.frames with NAs
    

    The thisdoesnotwork example is still a problem, but now because mice() detects collinearity (no surprise with only 5 observations, some of which are incomplete). Increasing the sample size to n=50 got me back to your original error, to which there is no solution implemented in simsem (i.e., there is no way to pass arguments to semTools methods for lavaan.mi objects; see more on this below). But I suspect the error may be due to your model being saturated (df = 0). When I ran a larger model (with df > 0), it worked fine:

    model <- ' mpg ~ carb + gear
       carb + gear ~ cyl
    '
    fit <- lavaan::sem(model, data = mtcars)
    thisworksnow <- simsem::sim(
        nRep = 10,
        model = fit,
        n = 50,
        miss = make_missing)
    

    When I add the direct effect of cyl on mpg and the residual covariance between carb and gear, the model is once again saturated, so the pooling method does not work.

    model <- '  mpg ~ carb + gear + cyl
       carb  + gear ~ cyl
       carb ~~ gear
    '
    fit <- lavaan::sem(model, data = mtcars)
    cannotwork <- simsem::sim(
        nRep = 10,
        model = fit,
        n = 50,
        miss = make_missing) # original error
    

    Since sim does not have a mechanism to "turn off" the collection of fitMeasures() to store in the SimResult object's @fit slot, I suppose a solution would be to update semTools:::D3.LRT() to return a pooled chisq and df = 0 for saturated models. I posted an issue on GitHub to get that done, so hopefully this error can be avoided in the future. You can watch for updates there:

    https://github.com/simsem/semTools/issues/113