Search code examples
rr-lavaanemmeans

Plot interaction effect (continuous predictor by 3-category moderator variable) in SEM with observed variables in R


I am estimating an SEM model that has observed variables. I want to use SEM to handle missing data using FIML. My model has an interaction term to test for moderation (continuous predictor by 3-category moderator variable). I want all lines of the interaction to appear in the same plot, so I assume I need to keep the 3-category moderator as a single variable rather than dummy code it. I get the following error when trying to estimate the model:

Error in data[[NAMES[1L]]] * data[[NAMES[2L]]] : non-numeric argument to binary operator

Here is a toy example that illustrates the issue.

library(lavaan)
library(car)
library(dplyr)
library(emmeans)
library(semTools)

data(starwars)

sw2 <- starwars %>% mutate(
  male = Recode(sex, "'male' = 1; NA=NA; else = 0"),
  sex2 = Recode(sex, "c('hermaphroditic','none') = 'other'"),
  human = Recode(species, "'Human' = 1; NA=NA; else = 0")
)

mod <- 'mass ~ height + human + sex2 + birth_year + sex2:birth_year'
## This is where the error occurs
fit <- sem(mod, data = sw2, missing = "fiml.x")

## calculate expected marginal means for multiple 
## levels of birth_year
BYrange <- range(sw2$birth_year, na.rm = TRUE)

## This is from previous code where 'male' was binary. 
## How would I modify the 'range' code here to allow for 3 categories
## in the 'sex2' variable?
malerange <- range(sw2$male, na.rm = TRUE) 
em.mass <- emmeans(fit, specs = ~ birth_year | male, 
                   at = list(male = 1:0, birth_year = BYrange),
                   # because SEMs can have multiple DVs:
                   lavaan.DV = "mass")
em.mass
## probe effect of birth_year across sex2
rbind(pairs(em.mass))
## plot effect of birth_year across sex2 in the same plot
emmip(em.mass, sex2 ~ birth_year)

Solution

  • Multicategory grouping variables need to be represented using multiple numeric codes that you choose yourself. This makes it impossible for emmeans to know when a set of (e.g.) dummy codes are actually from a single categorical variable. The ?lavaan2emmeans help page has an example showing the use of contrasts designed to test specific hypotheses.

    But an easier route might be to specify a multigroup model. The limitation of this approach is that any cases with a missing value on the grouping variable will be deleted.

    ## remove rows with  missing sex2 to prevent
    ## error in emmeans() caused by lavPredict()
    sw3 <- sw2[!is.na(sw2$sex2), ]
    
    ## fit model without "human" because it 
    ## has no variance in the "other" group
    mod <- 'mass ~ height + birth_year'
    fit3g <- sem(mod, data = sw3, missing = "fiml.x", group = "sex2",
                 # optional: set group order to be alphabetical
                 group.label = c("female","male","other"))
    em3g <- emmeans(fit3g, specs = ~ birth_year | sex2,
                    at = list(birth_year = 10:9),
                    lavaan.DV = "mass", nesting = NULL)
    rbind(pairs(em3g))
    

    Like in my reply to your previous question, the slope of continuous predictor birth_year can be returned by choosing any at= values 1 unit apart. Above, I used: at = list(birth_year = 10:9), with a higher number first so that the "comparison" returns the correct sign of the slopes:

    coef(fit3g)[paste0("mass~birth_year", c("",".g2",".g3"))]