Search code examples
rlme4mixed-models

lmer: Error for mixed effects model with random intercept - number of levels of each grouping factor must be < number of observations


I'm currently trying to run a mixed models solution to examine differences in warmth and competence ratings depending on intersectionality of target age and gender (race controlled) participants were asked to rate 2 random targets of different intersectional identities. There are 276 rows of data, 276 unique levels of ResponseId (e.,g., 276 participants), 3 age levels (Old, Young, empty) and 3 gender levels (Men, Women, empty).

It appears that using "ResponseId" is not appropriate for running this function - does anyone have an inkling as to why?

Here's what I have so far (note, some of "TargetGender" and "TargetAge" are intended to be empty as participants only evaluated some targets on either gender or age).

Sample data:

`         ResponseId TargetAge TargetGender TargetAge2 TargetGender2  Warmth1  Warmth2
1 R_3O1E4cOxRIejI1k       Old        Women                    Women   5.363636 5.272727
2 R_1EaFGkyVNdhlgQO       Old        Women                      Men   5.181818 5.181818
3 R_2eVHfsG4p7g0QZE       Old          Men      Young           Men   3.909091 3.545455
4 R_BtYn33qaXVoYh8d       Old          Men      Young           Men   1.363636 2.636364
5 R_d5S9ajl6C9bfTNL       Old        Women                    Women   4.727273 3.909091
6 R_1kXCRRZvdTmYsj7       Old        Women      Young           Men   5.454545 5.545455

Sample code and error:

model <- lmer(Warmth1 ~ TargetAge*TargetGender + (1 | ResponseId), 
              data=my_data)

Error: number of levels of each grouping factor must be < number of 
    observations (problems: ResponseId)

Solution

  • Following up on @zephyrl's comment that you need to convert your data to long format ("The error is telling you that since there’s only one row per participant, it doesn’t make sense to nest within participants"):

    example data

    This is your data from above, modified slightly (adding "1" to the target gender and age variable names for trial 1, to simplify reshaping the data):

    dd <- read.csv(header=TRUE, row.names =1, text = "
    ResponseId,TargetAge1,TargetGender1,TargetAge2,TargetGender2,Warmth1,Warmth2
    1,R_3O1E4cOxRIejI1k,Old,Women,,Women,5.363636,5.272727
    2,R_1EaFGkyVNdhlgQO,Old,Women,,Men,5.181818,5.181818
    3,R_2eVHfsG4p7g0QZE,Old,Men,Young,Men,3.909091,3.545455
    4,R_BtYn33qaXVoYh8d,Old,Men,Young,Men,1.363636,2.636364
    5,R_d5S9ajl6C9bfTNL,Old,Women,,Women,4.727273,3.909091
    6,R_1kXCRRZvdTmYsj7,Old,Women,Young,Men,5.454545,5.545455
    ")
    

    reshaping

    This is a slightly trickier-than-usual reshaping problem since the target-age, target-gender, and response (warmth) variables all need to be converted to long format. What I've done here works but is a little clunky — there may well be a SO question somewhere that explains how to do this more elegantly.

    library(tidyverse)
    dfun <- function(data, nm = "Warmth") {
        data |> dplyr::select(c(ResponseId, starts_with(nm))) |>
            pivot_longer(cols = starts_with(nm), names_prefix = nm,
                         values_to = nm, names_to = "trial")
    }
    
    d_long <- (dfun(dd, "Warmth")
        |> left_join(dfun(dd, "TargetAge"))
        |> left_join(dfun(dd, "TargetGender"))
        |> filter(TargetAge != "")  ## cases missing a trial
    )
    

    Now we're ready to fit:

    library(lme4)
    lmer(Warmth ~ TargetAge + TargetGender + (1|ResponseId), d_long)
    

    The maximal model here would be

    lmer(Warmth ~ TargetAge + TargetGender + 
            (TargetAge + TargetGender|ResponseId), 
            data = d_long)
    

    because we may need to account for among-participant variation in age and gender effects (see e.g. Barr et al. 2013 "Random effects structure for confirmatory hypothesis testing: Keep it maximal" and Matuschek et al. 2017 "Balancing Type I error and power in linear mixed models").