I have a question regarding the degrees of freedom that are used by the lsmeans package in the case of a linear mixed model built with the nlme package.
Here is an example to illustrate my question based on the Oats dataset. I am not trying to discuss if this model is relevant given the dataset, I am just trying to reproduce an issue I had with another dataset ;-).
Oats.lme <- lme(yield ~ Variety, random = ~1 | Block, data = Oats)
anova(Oats.lme)
With the anova, I obtain the expected 64 degrees of freedom.
numDF denDF F-value p-value
(Intercept) 1 64 245.1409 <.0001
Variety 2 64 1.6654 0.1972
Then I use the lsmeans function:
lsmeans(Oats.lme, list(poly ~ Variety))
and I obtain
$`lsmeans of Variety`
Variety lsmean SE df lower.CL upper.CL
Golden Rain 104.5000 7.680866 5 84.75571 124.2443
Marvellous 109.7917 7.680866 5 90.04737 129.5360
Victory 97.6250 7.680866 5 77.88071 117.3693
Confidence level used: 0.95
$`polynomial contrasts of contrast`
contrast estimate SE df t.ratio p.value
linear -6.87500 6.68529 64 -1.028 0.3076
quadratic -17.45833 11.57926 64 -1.508 0.1365
For the contrasts, I obtain the same 64 df, but for the lsmeans themselves, I have only 5 df. I also use SAS, and for the same kind of models, I have the same number of df for both lsmeans and contrasts (which would be 64 with the current example).
I have seen that it might be possible to change degrees of freedom when using the lme4 package, but my code is embedded in an internally-developed tool that is based on nlme, so I am basically stuck with nlme.
Would anyone now why this happens and if this is possible to change it? Or am I missing something?
I initially noticed these reduced degrees of freedom for the lsmeans in one specific case, where my random run effect only had 2 levels, and when I was interested in a Dunnett’s adjustment. Since I am more interested in contrasts than in lsmeans, now that I understood where it comes from, I can still work with it, but I put it there just in case someone has the same error and wants to know why.
I reproduced it below with the Oats data example. The error I obtain happens in the lsmeans:::.qdunnx function and is due to the df for the lsmeans being at 1.
Oats.lme <- lme(yield ~ Variety, random = ~1 | Block, data = subset(Oats,Block %in% c("I","II")))
lsm <- lsmeans(Oats.lme, trt.vs.ctrl ~ Variety)
summary(lsm,adjust = "dunnettx", infer = c(T, T), level = 0.95)
And here is the result
$lsmeans
Variety lsmean SE df lower.CL upper.CL
Golden Rain 123.250 15.88642 1 -78.60608 325.1061
Marvellous 125.500 15.88642 1 -76.35608 327.3561
Victory 115.125 15.88642 1 -86.73108 316.9811
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Marvellous - Golden Rain 2.250 12.8697 20 0.175 0.9695
Victory - Golden Rain -8.125 12.8697 20 -0.631 0.7482
P value adjustment: dunnettx method for 2 tests
Error in if (abs(diff(r[1:2])) < 5e-04) return(r[1]) :
missing value where TRUE/FALSE needed
In addition: Warning message:
In qtukey(p, (1 + sqrt(1 + 8 * k))/2, df) : production de NaN
The model says the response variable is subject to two random variations: those due to blocks, and those due to varieties. The means for each variety include both those sources of variation; but comparisons of those means exclude the block variations because the varieties are compared on the same block.
You have only six blocks so there are 5 degrees of freedom for estimating the variations of blocks, and that explains the degrees of freedom for the variety means. There are more degrees of freedom for the comparisons, because you do not have to account for the block variations.
Another matter to consider here is that the support for the nlme package uses a containment method for obtaining degrees of freedom. That essentially involves looking at the worst case scenario for degrees of freedom for each effect. If you instead use the lme4 package and the lmer
function to fit the model, lsmeans
will use a Satterthwaite or Kendall-Roger method to obtain degrees of freedom, and those results might be somewhat larger. However, the degrees of freedom for the means will still be considerably less than those for the comparisons.
Here is some SAS code with the same data and model:
proc mixed data = Oats;
class Variety Block;
model yield = Variety / ddfm = satterth;
random Block;
lsmeans Variety / tdiff;
... and the lsmeans results:
Least Squares Means
Standard
Effect Variety Estimate Error DF t Value Pr > |t|
Variety Golden_R 104.50 7.6809 8.87 13.61 <.0001
Variety Marvello 109.79 7.6809 8.87 14.29 <.0001
Variety Victory 97.6250 7.6809 8.87 12.71 <.0001
Differences of Least Squares Means
Standard
Effect Variety _Variety Estimate Error DF t Value Pr > |t|
Variety Golden_R Marvello -5.2917 6.6853 64 -0.79 0.4316
Variety Golden_R Victory 6.8750 6.6853 64 1.03 0.3076
Variety Marvello Victory 12.1667 6.6853 64 1.82 0.0734
Note that SAS shows 64 df for the comparisons but only 8.87 df for the means themselves, when the Satterthwaite method is used for degrees of freedom.
If one omits the ddfm
option in the model
statement, it defaults to the containment method for df, and it lists 64 df in both tables. However, I believe that SAS is incorrect in its implementation of containment; see my earlier post on this topic in CrossValidated: https://stats.stackexchange.com/questions/140156/degrees-of-freedom-using-containment-method