Search code examples
rstatisticsmixed-modelsnlme

Predict new values using mixed model using lme() in R


I have the following data:

str(growth_data)
tibble [92 × 4] (S3: tbl_df/tbl/data.frame)
 $ person: num [1:92] 1 1 1 1 2 2 2 2 3 3 ...
 $ gender: chr [1:92] "F" "F" "F" "F" ...
 $ growth: num [1:92] 21 20 21.5 23 21 21.5 24 25.5 20.5 24 ...
 $ age   : Factor w/ 4 levels "8","10","12",..: 1 2 3 4 1 2 3 4 1 2 ...

And from this, using the lme() function in the nlme package, I have created the following model:

# Fitting a mixed model with a random coefficient and unstructured covariance structure.
unstructured_rand <- nlme::lme(growth ~ gender*age, 
                     random = ~ age | person, 
                     data=growth_data, 
                     correlation = corSymm())

I am trying to produce a set of predictions for new age values, not in my data, for persons in my data. Specifically, I want to produce a prediction for person 1 at age 13.

I have tried, in vein, to use the predict() function whilst specifying the newdata argument, like so:

newGrowth <- expand.grid(
  person = unique(growth_data$person),
  gender = c("F","M"),
  age = c(13,15,17,20)
)

newGrowth$Predicted_Response <- predict(unstructured_rand, newdata = newGrowth)

However, I keep running into the following error:

Error in `Names<-.pdMat`(`*tmp*`, value = value[[i]]) : 
  Length of names should be 4

This seems to be suggesting that my newdata does not have the correct number of columns, but from all other posts on this subject, I have never seen anyone specify a newdata dataframe with the correct number of columns. Further, the only column in my data that is not in the newdata dataframe is growth, which is the variable I am trying to predict.

What am I missing? There seems to be some obvious element from the documentation on lme.predict() that I am failing to apply to my data, but I cannot figure out what it is.

Any help would be much appreciated!


Solution

  • One issue (or maybe the issue at hand) is that you fit a model on data where age was a factor and then tried to predict on data where age was continuous.

    Because you did not supply your data, I can't be certain this is the same issue. But the Orthodont data is similar to yours, and this produces an error with the same wording.

    Similar Error

    library(nlme)
    
    # make some data like yours
    orthodont <- Orthodont
    orthodont$age <- factor(orthodont$age)
    
    # fit a model similar to yours
    fm1 <- lme(distance ~ age, orthodont, random = ~ age | Subject)
    
    # make some new data like your new data
    newOrth <- data.frame(Sex = c("Male","Male","Female","Female","Male","Male"),
                          age = c(15, 20, 10, 12, 2, 4),
                          Subject = c("M01","M01","F30","F30","M04","M04"))
    
    # attempt prediction and notice same error
    predict(fm1, newOrth, level = 0:1)
    #> Warning in model.frame.default(formula = asOneFormula(formula(reSt), fixed), :
    #> variable 'age' is not a factor
    #> Error in `Names<-.pdMat`(`*tmp*`, value = value[[i]]): Length of names should be 4
    

    A Fix

    Fit a model on data with a continuous age variable and use that for prediction. Especially because you are trying to extrapolate past ages for which the model had been fit.

    # change factor to numeric to match new data
    orthodont$age <- as.numeric(as.character(orthodont$age))
    
    # refit
    fm2 <- lme(distance ~ age, orthodont, random = ~ age | Subject)
    
    # attempt prediction again
    predict(fm2, newOrth, level = 0:1)
    #>   Subject predict.fixed predict.Subject
    #> 1     M01      26.66389        30.95074
    #> 2     M01      29.96481        35.33009
    #> 3     F30      23.36296              NA
    #> 4     F30      24.68333              NA
    #> 5     M04      18.08148        20.95016
    #> 6     M04      19.40185        22.13877
    

    Created on 2024-05-03 with reprex v2.1.0