Search code examples
rnlme

Modeling longitudinal correlation correctly with nlme when there are missing outcomes (R)


I have longitudinal data in long format which look, for the first two subjects, like this:

  id X  M     Y
1  1 0 M1  2.53
2  1 0 M2  1.45
3  1 0 M3  1.17
4  1 0 M5  0.78
5  1 0 M7 -0.95
6  1 0 M9 -0.07
7  2 1 M1 -0.81
8  2 1 M2 -1.66
9  2 1 M3 -0.01
10 2 1 M5  0.39

M1 to M9 denote nine different fixed measurement occasions. As typical with longitudinal data, some outcomes Y are missing. Subject id 1 misses outcomes for M4, M6, M8, and id 2 misses M4 and M6 to M9. Other subjects in the data miss data at different points.

A random intercept model fit with lme with occasions and the covariate X as fixed effects is

lme(fixed = Y ~ M + X, random = ~ 1 | id , data  = dat)

It is well known that this yields implicitly a compound symmetry correlation structure and the estimates are consistent as long as the missing outcomes are MAR. If compound symmetry is not plausible, it is an option to add random slopes or to specify a different correlation structure, such as unstructured.

lme(fixed = Y ~ M + X, random = ~ 1 | id , data  = dat, correlation = corSymm())

Then in the output I get a within group correlation matrix of M1 to M9. However, how does lme know which time points M are adjacent, i.e. what the ordering is and where two outcomes are not adjacent? For example, for id 1, it looks like lme will take its first 6 measurements as M1, M2, ..., M6, instead of, as it should be, M1, M2, M3, M5, M7, and M9. So I am concerned the unstructured correlation matrix is incorrectly estimated. Is there a way to pass the information to lme which time points are concerned for each Y?


Solution

  • If M1 to M9 are fixed measurement occasions, they can be identified as time and treated as a continuous variable.

    library(nlme)
    # arbitrary time selected from the measurement occasion labels
    dat$T <- as.numeric(sub("M", "", dat$M))
    lme(fixed = Y ~ T + X, random = ~ 1 | id , data  = dat) 
    

    Alternatively, the corSymm structure can contain information on a time covariate and define the order of the measurements.

    corSymm(form = ~ T)
    

    Note that T must contain a sequence of consecutive integers to be successfully used as a time covariate in the correlation structure.