Search code examples
rlme4mixed-modelsrandom-effects

How to specify year for nested random effect?


I am quite new to R and is setting up a mixed effect model with lmer. I have a large dataset and want to include 3 random effects that are nested. I have data from 2 years, 2021 and 2022, and there are rounds of experiments every year, that are named experiment 1, 2 and 3 and so on on each year. The problem is that for the random effects, the number of experiment rounds are calculated wrong, as the numbering of the experiments are the same for each year. How can I include "year" for the random effects so the number of groups will be correct?

This is a example of the set up of my df :

example<-data.frame("Site" = c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2),
                     "Square" = c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2),
                     "Round" = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
                     "Year" = c(2021,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022,2022,2021,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022,2022),
                     "Variable" = c(2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,1,1,1),
                     "Outcome" = c(20,10,39,22,25,29,13,15,12,10,9,15,40,33,42,29,35,50,25,23,22,26,24,23))

Here is my code where I want to investigate Outcome predicted by Variable (fixed effect) with Site , Square and Round as nested random effects:

model8<-lmer(Outcome ~ Variable + (1|Site/Square/Round), 
   data=example)
summary(model8)

And here is the output:

boundary (singular) fit: see help('isSingular')
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Outcome ~ Variable + (1 | Site/Square/Round)
   Data: example

REML criterion at convergence: 150.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.44099 -0.39914 -0.02029  0.45294  2.26577 

Random effects:
 Groups              Name        Variance Std.Dev.
 Round:(Square:Site) (Intercept)  0.00    0.000   
 Square:Site         (Intercept)  0.00    0.000   
 Site                (Intercept) 78.12    8.838   
 Residual                        37.96    6.161   
Number of obs: 24, groups:  Round:(Square:Site), 12; Square:Site, 4; Site, 2

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)    5.000      7.408  1.813   0.675    0.575    
Variable      13.083      2.515 21.000   5.201 3.73e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
         (Intr)
Variable -0.509
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Please do not pay attention to the singularity error, this is just an example. Look at the number of observations:

Number of obs: 24, groups:  Round:(Square:Site), 12; Square:Site, 4; Site, 2

Here the number of rounds is incorrect, it should be 24 (same as number of observations). How should I specify year for the random effect Round? Is it at all possible or I am on the wrong track? The Sites and Squares are the same year after year, but the rounds are unique for each year/the dataset. Or is it just easier to change the Rounds to 1, 2, 3 at 2021 and then 4,5,6 on year 2022 in the actual dataframe? I am just in the process of setting up my model, and I have started with the random effects.


Solution

  • If you want the values of Round to be disambiguated by year you can do this:

     lmer(Outcome ~ Variable + (1|Site/Square) +
       + (1|Year:Site:Square:Round), data = example)
    

    However, this will trigger the following error:

    Error: number of levels of each grouping factor must be < number of observations (problems: Year:Site:Square:Round)

    which occurs because, for any Year/Site/Square/Round combination, there is only one observation in the data set. This means that the variance associated with that random effect is confounded with the residual variance (put another way, these two variances are jointly unidentifiable). (There is another explanation of this here.)

    If your real data set (you said you had a "large dataset") has more than one observation per Year/Site/Square/Round combination, then this won't be an issue.

    I would also say that treating Site as a random effect if there are only two levels is unlikely to work well. (Again, maybe you have more sites in your real data set.)