I'm currently trying to run a mixed models solution to examine differences in warmth and competence ratings depending on intersectionality of target age and gender (race controlled) participants were asked to rate 2 random targets of different intersectional identities. There are 276 rows of data, 276 unique levels of ResponseId (e.,g., 276 participants), 3 age levels (Old, Young, empty) and 3 gender levels (Men, Women, empty).
It appears that using "ResponseId" is not appropriate for running this function - does anyone have an inkling as to why?
Here's what I have so far (note, some of "TargetGender" and "TargetAge" are intended to be empty as participants only evaluated some targets on either gender or age).
Sample data:
` ResponseId TargetAge TargetGender TargetAge2 TargetGender2 Warmth1 Warmth2
1 R_3O1E4cOxRIejI1k Old Women Women 5.363636 5.272727
2 R_1EaFGkyVNdhlgQO Old Women Men 5.181818 5.181818
3 R_2eVHfsG4p7g0QZE Old Men Young Men 3.909091 3.545455
4 R_BtYn33qaXVoYh8d Old Men Young Men 1.363636 2.636364
5 R_d5S9ajl6C9bfTNL Old Women Women 4.727273 3.909091
6 R_1kXCRRZvdTmYsj7 Old Women Young Men 5.454545 5.545455
Sample code and error:
model <- lmer(Warmth1 ~ TargetAge*TargetGender + (1 | ResponseId),
data=my_data)
Error: number of levels of each grouping factor must be < number of
observations (problems: ResponseId)
Following up on @zephyrl's comment that you need to convert your data to long format ("The error is telling you that since there’s only one row per participant, it doesn’t make sense to nest within participants"):
This is your data from above, modified slightly (adding "1" to the target gender and age variable names for trial 1, to simplify reshaping the data):
dd <- read.csv(header=TRUE, row.names =1, text = "
ResponseId,TargetAge1,TargetGender1,TargetAge2,TargetGender2,Warmth1,Warmth2
1,R_3O1E4cOxRIejI1k,Old,Women,,Women,5.363636,5.272727
2,R_1EaFGkyVNdhlgQO,Old,Women,,Men,5.181818,5.181818
3,R_2eVHfsG4p7g0QZE,Old,Men,Young,Men,3.909091,3.545455
4,R_BtYn33qaXVoYh8d,Old,Men,Young,Men,1.363636,2.636364
5,R_d5S9ajl6C9bfTNL,Old,Women,,Women,4.727273,3.909091
6,R_1kXCRRZvdTmYsj7,Old,Women,Young,Men,5.454545,5.545455
")
This is a slightly trickier-than-usual reshaping problem since the target-age, target-gender, and response (warmth) variables all need to be converted to long format. What I've done here works but is a little clunky — there may well be a SO question somewhere that explains how to do this more elegantly.
library(tidyverse)
dfun <- function(data, nm = "Warmth") {
data |> dplyr::select(c(ResponseId, starts_with(nm))) |>
pivot_longer(cols = starts_with(nm), names_prefix = nm,
values_to = nm, names_to = "trial")
}
d_long <- (dfun(dd, "Warmth")
|> left_join(dfun(dd, "TargetAge"))
|> left_join(dfun(dd, "TargetGender"))
|> filter(TargetAge != "") ## cases missing a trial
)
Now we're ready to fit:
library(lme4)
lmer(Warmth ~ TargetAge + TargetGender + (1|ResponseId), d_long)
The maximal model here would be
lmer(Warmth ~ TargetAge + TargetGender +
(TargetAge + TargetGender|ResponseId),
data = d_long)
because we may need to account for among-participant variation in age and gender effects (see e.g. Barr et al. 2013 "Random effects structure for confirmatory hypothesis testing: Keep it maximal" and Matuschek et al. 2017 "Balancing Type I error and power in linear mixed models").