My question is fairly straightforward. When I run the code below, copied directly from the "Applied Longitudinal Analysis" website, here: https://content.sph.harvard.edu/fitzmaur/ala2e/ (It's Chapter 5, Section 5.7),
library(foreign)
ds <- read.dta("tlc.dta")
ds$baseline <- ds$y0
tlclong <- reshape(ds, idvar="id", varying=c("y0","y1","y4","y6"),v.names="y", timevar="time", time=1:4, direction="long")
tlclong <- subset(tlclong, time > 1)
attach(tlclong)
week <- time
week[time==2] <- 1
week[time==3] <- 4
week[time==4] <- 6
time <- time - 1
week.f <- factor(week, c(1,4,6))
change <- y - baseline
cbaseline <- baseline - 26.406
library(nlme)
model <- gls(y ~ I(week.f==1) + I(week.f==4) + I(week.f==6) + I(week.f==1 & trt=="Succimer") + I(week.f==4 & trt=="Succimer") + I(week.f==6 & trt=="Succimer"), corr=corSymm(, form= ~ time | id), weights = varIdent(form = ~ 1 | week.f))
summary(model)
I get output that is completely different in a few very crucial areas.
This is the output I get:
Generalized least squares fit by REML
Model: y ~ I(week.f == 1) + I(week.f == 4) + I(week.f == 6) + I(week.f == 1 & trt == "Succimer") + I(week.f == 4 & trt == "Succimer") + I(week.f == 6 & trt == "Succimer")
Data: NULL
AIC BIC logLik
2028.922 2076.764 -1001.461
Correlation Structure: General
Formula: ~time | id
Parameter estimate(s):
Correlation:
1 2
2 0
3 0 0
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | week.f
Parameter estimates:
1 4 6
1 1 1
Coefficients:
Value Std.Error t-value p-value
(Intercept) 23.646 1.002955 23.576329 0.0000
I(week.f == 1)TRUE 1.014 1.418393 0.714894 0.4752
I(week.f == 4)TRUE 0.424 1.418393 0.298930 0.7652
I(week.f == 1 & trt == "Succimer")TRUE -11.138 1.418393 -7.852550 0.0000
I(week.f == 4 & trt == "Succimer")TRUE -8.556 1.418393 -6.032180 0.0000
I(week.f == 6 & trt == "Succimer")TRUE -2.884 1.418393 -2.033287 0.0429
Correlation:
(Intr) I(.==1 I(.==4 I=1&t=" I=4&t="
I(week.f == 1)TRUE -0.707
I(week.f == 4)TRUE -0.707 0.500
I(week.f == 1 & trt == "Succimer")TRUE 0.000 -0.500 0.000
I(week.f == 4 & trt == "Succimer")TRUE 0.000 0.000 -0.500 0.000
I(week.f == 6 & trt == "Succimer")TRUE -0.707 0.500 0.500 0.000 0.000
Standardized residuals:
Min Q1 Med Q3 Max
-2.3494198 -0.6575048 -0.1467858 0.5279214 6.0826597
Residual standard error: 7.091964
Degrees of freedom: 300 total; 293 residual
And here is the output shown on the website linked above:
Generalized least squares fit by REML
Model: y ~ I(week.f == 1) + I(week.f == 4) + I(week.f == 6) + I(week.f == 1 & trt == "Succimer") + I(week.f == 4 & trt == "Succimer") + I(week.f == 6 & trt == "Succimer")
Data: NULL
AIC BIC logLik
2451.990 2519.544 -1208.995
Correlation Structure: General
Formula: ~time | id
Parameter estimate(s):
Correlation:
1 2 3
2 0.569
3 0.568 0.775
4 0.575 0.581 0.580
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | week.f
Parameter estimates:
0 1 4 6
1.000000 1.330103 1.374827 1.529615
Coefficients:
Value Std.Error t-value p-value
(Intercept) 26.406000 0.4998908 52.82354 0.0000
I(week.f == 1)TRUE -1.644501 0.7824044 -2.10186 0.0362
I(week.f == 4)TRUE -2.231356 0.8073811 -2.76370 0.0060
I(week.f == 6)TRUE -2.642065 0.8864616 -2.98046 0.0031
I(week.f == 1 & trt == "Succimer")TRUE -11.340998 1.0931205 -10.37488 0.0000
I(week.f == 4 & trt == "Succimer")TRUE -8.765288 1.1312570 -7.74827 0.0000
I(week.f == 6 & trt == "Succimer")TRUE -3.119869 1.2507776 -2.49434 0.0130
Correlation:
(Intr) I(.==1 I(.==4 I(.==6 I=1&t="
I(week.f == 1)TRUE -0.155
I(week.f == 4)TRUE -0.136 0.674
I(week.f == 6)TRUE -0.068 0.381 0.380
I(week.f == 1 & trt == "Succimer")TRUE 0.000 -0.699 -0.467 -0.265
I(week.f == 4 & trt == "Succimer")TRUE 0.000 -0.466 -0.701 -0.265 0.667
I(week.f == 6 & trt == "Succimer")TRUE 0.000 -0.263 -0.263 -0.705 0.376
I=4&t="
I(week.f == 1)TRUE
I(week.f == 4)TRUE
I(week.f == 6)TRUE
I(week.f == 1 & trt == "Succimer")TRUE
I(week.f == 4 & trt == "Succimer")TRUE
I(week.f == 6 & trt == "Succimer")TRUE 0.375
Standardized residuals:
Min Q1 Med Q3 Max
-2.1636401 -0.7011814 -0.1426534 0.5374840 5.6570302
Residual standard error: 4.998908
Degrees of freedom: 400 total; 393 residual
Note that
Also, when I run the code, I get an error saying
Error in glsEstimate(object, control = control) :
computed "gls" fit is singular, rank 7
So, to even get any output at all, I had to add this argument to the gls() function:
control = list(singular.ok = TRUE)
So my question is, why is the output so different when I run the code myself? Clearly whoever made the website neglected to copy the code correctly.
If you would like to try running the code yourself, the website linked above has the dataset tlc.dta available for download.
Changing Fitzmaurice's code slightly, to INCLUDE time == 1 (and therefore week == 0), I get the same output as shown on the website:
ds <- foreign::read.dta("tlc.dta")
ds$baseline <- ds$y0
tlclong <- reshape(ds, idvar="id", varying=c("y0","y1","y4","y6"),
v.names="y", timevar="time", time=1:4, direction="long")
# tlclong <- subset(tlclong, time > 1)
attach(tlclong)
week <- time
week[time==1] <- 0
week[time==2] <- 1
week[time==3] <- 4
week[time==4] <- 6
time <- time - 1
week.f <- factor(week, c(0,1,4,6))
change <- y - baseline
cbaseline <- baseline - 26.406
library(nlme)
model <- gls(y ~ week.f +
I(week.f==1 & trt=="Succimer") +
I(week.f==4 & trt=="Succimer") +
I(week.f==6 & trt=="Succimer"),
corr=corSymm(form= ~ time | id),
weights = varIdent(form = ~ 1 | week.f)
)
summary(model)
So in conclusion, I should change the wording of my question from "nlme's output is different now" to "Fitzmaurice forgot to put what his code MUST have certainly been originally, in order to get the output shown on his website". My bad.