Search code examples
rnlme

nlme::gls() R code on Applied Longitudinal Analysis, 2nd Edition website needs some tweaks


My question is fairly straightforward. When I run the code below, copied directly from the "Applied Longitudinal Analysis" website, here: https://content.sph.harvard.edu/fitzmaur/ala2e/ (It's Chapter 5, Section 5.7),

library(foreign)
ds <- read.dta("tlc.dta")
ds$baseline <- ds$y0
tlclong <- reshape(ds, idvar="id", varying=c("y0","y1","y4","y6"),v.names="y", timevar="time", time=1:4, direction="long")
tlclong <- subset(tlclong, time > 1)
attach(tlclong)

week <- time
week[time==2] <- 1
week[time==3] <- 4
week[time==4] <- 6
time <- time - 1
week.f <- factor(week, c(1,4,6))
change <- y - baseline
cbaseline <- baseline - 26.406

library(nlme)
model <- gls(y ~ I(week.f==1) + I(week.f==4) + I(week.f==6) + I(week.f==1 & trt=="Succimer") + I(week.f==4 & trt=="Succimer") + I(week.f==6 & trt=="Succimer"), corr=corSymm(, form= ~ time | id), weights = varIdent(form = ~ 1 | week.f))

summary(model)

I get output that is completely different in a few very crucial areas.

This is the output I get:

Generalized least squares fit by REML
  Model: y ~ I(week.f == 1) + I(week.f == 4) + I(week.f == 6) + I(week.f ==      1 & trt == "Succimer") + I(week.f == 4 & trt == "Succimer") +      I(week.f == 6 & trt == "Succimer") 
  Data: NULL 
       AIC      BIC    logLik
  2028.922 2076.764 -1001.461

Correlation Structure: General
 Formula: ~time | id 
 Parameter estimate(s):
 Correlation: 
  1 2
2 0  
3 0 0
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | week.f 
 Parameter estimates:
1 4 6 
1 1 1 

Coefficients:
                                         Value Std.Error   t-value p-value
(Intercept)                             23.646  1.002955 23.576329  0.0000
I(week.f == 1)TRUE                       1.014  1.418393  0.714894  0.4752
I(week.f == 4)TRUE                       0.424  1.418393  0.298930  0.7652
I(week.f == 1 & trt == "Succimer")TRUE -11.138  1.418393 -7.852550  0.0000
I(week.f == 4 & trt == "Succimer")TRUE  -8.556  1.418393 -6.032180  0.0000
I(week.f == 6 & trt == "Succimer")TRUE  -2.884  1.418393 -2.033287  0.0429

 Correlation: 
                                       (Intr) I(.==1 I(.==4 I=1&t=" I=4&t="
I(week.f == 1)TRUE                     -0.707                              
I(week.f == 4)TRUE                     -0.707  0.500                       
I(week.f == 1 & trt == "Succimer")TRUE  0.000 -0.500  0.000                
I(week.f == 4 & trt == "Succimer")TRUE  0.000  0.000 -0.500  0.000         
I(week.f == 6 & trt == "Succimer")TRUE -0.707  0.500  0.500  0.000   0.000 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.3494198 -0.6575048 -0.1467858  0.5279214  6.0826597 

Residual standard error: 7.091964 
Degrees of freedom: 300 total; 293 residual

And here is the output shown on the website linked above:

Generalized least squares fit by REML
Model: y ~ I(week.f == 1) + I(week.f == 4) + I(week.f == 6) + I(week.f == 1 & trt == "Succimer") + I(week.f == 4 & trt == "Succimer") + I(week.f == 6 & trt == "Succimer") 

  Data: NULL 
       AIC      BIC    logLik
  2451.990 2519.544 -1208.995

Correlation Structure: General
 Formula: ~time | id 
 Parameter estimate(s):
 Correlation: 
  1     2     3    
2 0.569            
3 0.568 0.775      
4 0.575 0.581 0.580

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | week.f 
 Parameter estimates:
       0        1        4        6 
1.000000 1.330103 1.374827 1.529615 

Coefficients:
                                            Value Std.Error   t-value p-value
(Intercept)                             26.406000 0.4998908  52.82354  0.0000
I(week.f == 1)TRUE                      -1.644501 0.7824044  -2.10186  0.0362
I(week.f == 4)TRUE                      -2.231356 0.8073811  -2.76370  0.0060
I(week.f == 6)TRUE                      -2.642065 0.8864616  -2.98046  0.0031
I(week.f == 1 & trt == "Succimer")TRUE -11.340998 1.0931205 -10.37488  0.0000
I(week.f == 4 & trt == "Succimer")TRUE  -8.765288 1.1312570  -7.74827  0.0000
I(week.f == 6 & trt == "Succimer")TRUE  -3.119869 1.2507776  -2.49434  0.0130

 Correlation: 
                                       (Intr) I(.==1 I(.==4 I(.==6 I=1&t="
I(week.f == 1)TRUE                     -0.155                             
I(week.f == 4)TRUE                     -0.136  0.674                      
I(week.f == 6)TRUE                     -0.068  0.381  0.380               
I(week.f == 1 & trt == "Succimer")TRUE  0.000 -0.699 -0.467 -0.265        
I(week.f == 4 & trt == "Succimer")TRUE  0.000 -0.466 -0.701 -0.265  0.667 
I(week.f == 6 & trt == "Succimer")TRUE  0.000 -0.263 -0.263 -0.705  0.376 

                                       I=4&t="

I(week.f == 1)TRUE                            
I(week.f == 4)TRUE                            
I(week.f == 6)TRUE                            
I(week.f == 1 & trt == "Succimer")TRUE        
I(week.f == 4 & trt == "Succimer")TRUE        
I(week.f == 6 & trt == "Succimer")TRUE  0.375 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.1636401 -0.7011814 -0.1426534  0.5374840  5.6570302 
 
Residual standard error: 4.998908 
Degrees of freedom: 400 total; 393 residual

Note that

  1. The AIC, BIC, and LogLik values are all different
  2. The correlation output is completely different. My output gives all zeros, and is missing an entire dimension in the correlation matrix.
  3. My variance function parameter estimates are completely different; also missing the 0 output shown on the website.
  4. The estimated coefficients are wildly different in most cases; also my output is completely missing the week.f == 6 results.
  5. The correlation matrix outputted below the coefficients results is completely different.
  6. My output says 300 degrees of freedom total; Fitzmaurice's says 400 total (the rest of that last section of output is obviously different as well).

Also, when I run the code, I get an error saying

Error in glsEstimate(object, control = control) : 
  computed "gls" fit is singular, rank 7

So, to even get any output at all, I had to add this argument to the gls() function:

control = list(singular.ok = TRUE)

So my question is, why is the output so different when I run the code myself? Clearly whoever made the website neglected to copy the code correctly.

If you would like to try running the code yourself, the website linked above has the dataset tlc.dta available for download.


Solution

  • Changing Fitzmaurice's code slightly, to INCLUDE time == 1 (and therefore week == 0), I get the same output as shown on the website:

    ds <- foreign::read.dta("tlc.dta")
    ds$baseline <- ds$y0
    tlclong <- reshape(ds, idvar="id", varying=c("y0","y1","y4","y6"),
                       v.names="y", timevar="time", time=1:4, direction="long")
    # tlclong <- subset(tlclong, time > 1)
    attach(tlclong)
    
    week <- time
    week[time==1] <- 0
    week[time==2] <- 1
    week[time==3] <- 4
    week[time==4] <- 6
    time <- time - 1
    week.f <- factor(week, c(0,1,4,6))
    change <- y - baseline
    cbaseline <- baseline - 26.406
    
    library(nlme)
    model <- gls(y ~ week.f +
                     I(week.f==1 & trt=="Succimer") +
                     I(week.f==4 & trt=="Succimer") +
                     I(week.f==6 & trt=="Succimer"),
                 corr=corSymm(form= ~ time | id),
                 weights = varIdent(form = ~ 1 | week.f)
                 )
    summary(model)
    

    So in conclusion, I should change the wording of my question from "nlme's output is different now" to "Fitzmaurice forgot to put what his code MUST have certainly been originally, in order to get the output shown on his website". My bad.