Search code examples
rstatisticslme4random-effectsposthoc

Running a post-hoc test on a random effect model with two dummy variables in R


I have a data frame in R - my code is as follows:

library(lme4)
library(lmerTest)
library(multcomp)

#Create DF
df <- data.frame(
  col1 = rep(1:3, each = 3),
  col2 = rep(c("A", "B", "C"), times = 3),
  col3 = rnorm(9)
)
df$A <- ifelse(df$col2 == "A", 1, 0)
df$B <- ifelse(df$col2 == "B", 1, 0)

#Make "col1" into factor, as it's the random effect
df$col1 <- as.factor(df$col1)

I've run this model on it:

model <- lmer(col3 ~ A + B + (1 | col1),
                   data = df)

summary(model)

Which gives me this output:

Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: col3 ~ A + B + (1 | col1)
   Data: df

REML criterion at convergence: 23.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3966 -0.3950  0.2740  0.4242  1.1226 

Random effects:
 Groups   Name        Variance Std.Dev.
 col1     (Intercept) 0.000    0.000   
 Residual             1.636    1.279   
Number of obs: 9, groups:  col1, 3

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)
(Intercept)   1.0363     0.7386  6.0000   1.403    0.210
A            -1.3912     1.0445  6.0000  -1.332    0.231
B            -1.2642     1.0445  6.0000  -1.210    0.272

Correlation of Fixed Effects:
  (Intr) A     
A -0.707       
B -0.707  0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')```



Now I'd like to run a post-hoc test, I tried using the code given here

It looks like this:

summary(glht(model, linfct = mcp(col1 = "Tukey")), test = adjusted("holm"))

The error I get is this:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'summary': Variable(s) ‘col1’ have been specified in ‘linfct’ but cannot be found in ‘model’!

Would love help.

TIA!

*In response to a comment by Roland - I'd like to compare between A, B, C - while taking the random effect into consideration.

The mixed effect model compares A vs. C and B vs. C, but I want to have a B vs. C comparison while the random effect is still taken into consideration.

I tried running a permutation test with random effect but I can't seem to find code that works for that, so I'm left with post-hoc. Hope this clears it up!


Solution

  • I'm don't think glht, can do posthocs tests with a random variable.

    I'm also under the impression this is not an oversight from the developers. Statistically, random variables are used for controlling for a categorical variable that we know has an important effect but we could not measure all categories. For example, imagine that we are measuring the effect of a drug on an animal's insulin production, and we took multiple samples of the same animal. Naturally, samples from the same animals are likely to be more similar, so the "animal identity" is an important factor to consider. Yet, we want to make inferences about the entire population of animals, but we cannot sample all individuals to properly account for their effect. So we use random effects to try to gleam the variance between individuals and use that as a proxy for different animal's identities.

    With that reasoning, you can see that post-hoc on random effects makes little statistical sense. If we are doing a post-hoc, it implies that we care about the difference between the specific treatments we measure. They are not mere samples of a bigger set of treatments that we aim to statistically control and remove from our inference. If we care about the difference between treatments A and B then they are, by design, fixed factors.

    DISCLAIMER: this interpretation is of course, not the only one. People often use AIC to test for the significance of random variables, especially in personality studies. So random variables can be used as more than covariates in some cases. But, AFAIK, even in this use it is assumed that the difference between current sampled levels is not as important and is a simple representation of population differences.