I am estimating an SEM model that has observed variables. I want to use SEM to handle missing data using FIML. My model has an interaction term to test for moderation (continuous predictor by 3-category moderator variable). I want all lines of the interaction to appear in the same plot, so I assume I need to keep the 3-category moderator as a single variable rather than dummy code it. I get the following error when trying to estimate the model:
Error in data[[NAMES[1L]]] * data[[NAMES[2L]]] : non-numeric argument to binary operator
Here is a toy example that illustrates the issue.
library(lavaan)
library(car)
library(dplyr)
library(emmeans)
library(semTools)
data(starwars)
sw2 <- starwars %>% mutate(
male = Recode(sex, "'male' = 1; NA=NA; else = 0"),
sex2 = Recode(sex, "c('hermaphroditic','none') = 'other'"),
human = Recode(species, "'Human' = 1; NA=NA; else = 0")
)
mod <- 'mass ~ height + human + sex2 + birth_year + sex2:birth_year'
## This is where the error occurs
fit <- sem(mod, data = sw2, missing = "fiml.x")
## calculate expected marginal means for multiple
## levels of birth_year
BYrange <- range(sw2$birth_year, na.rm = TRUE)
## This is from previous code where 'male' was binary.
## How would I modify the 'range' code here to allow for 3 categories
## in the 'sex2' variable?
malerange <- range(sw2$male, na.rm = TRUE)
em.mass <- emmeans(fit, specs = ~ birth_year | male,
at = list(male = 1:0, birth_year = BYrange),
# because SEMs can have multiple DVs:
lavaan.DV = "mass")
em.mass
## probe effect of birth_year across sex2
rbind(pairs(em.mass))
## plot effect of birth_year across sex2 in the same plot
emmip(em.mass, sex2 ~ birth_year)
Multicategory grouping variables need to be represented using multiple numeric codes that you choose yourself. This makes it impossible for emmeans
to know when a set of (e.g.) dummy codes are actually from a single categorical variable. The ?lavaan2emmeans
help page has an example showing the use of contrasts designed to test specific hypotheses.
But an easier route might be to specify a multigroup model. The limitation of this approach is that any cases with a missing value on the grouping variable will be deleted.
## remove rows with missing sex2 to prevent
## error in emmeans() caused by lavPredict()
sw3 <- sw2[!is.na(sw2$sex2), ]
## fit model without "human" because it
## has no variance in the "other" group
mod <- 'mass ~ height + birth_year'
fit3g <- sem(mod, data = sw3, missing = "fiml.x", group = "sex2",
# optional: set group order to be alphabetical
group.label = c("female","male","other"))
em3g <- emmeans(fit3g, specs = ~ birth_year | sex2,
at = list(birth_year = 10:9),
lavaan.DV = "mass", nesting = NULL)
rbind(pairs(em3g))
Like in my reply to your previous question, the slope of continuous predictor birth_year
can be returned by choosing any at=
values 1 unit apart. Above, I used: at = list(birth_year = 10:9)
, with a higher number first so that the "comparison" returns the correct sign of the slopes:
coef(fit3g)[paste0("mass~birth_year", c("",".g2",".g3"))]