Search code examples
rpredictionsurvival-analysisweibull

How can I predict using an AFT model with the survival package in R?


I am using an accelerated failure time / AFT model with a weibull distribution to predict data. I am doing this using the survival package in R. I am splitting my data in training and test, do training on the training set and afterwards try to predict the values for the test set. To do that I am passing the the test set as the newdata parameter, as stated in the references. I get an error, saying that newdata does not have the same size as the training data (obviously!). Then the function seems to evaluate predict the values for the training set.

How can I predict the values for the new data?

# get data
library(KMsurv)
library(survival)
data("kidtran") 
n = nrow(kidtran)
kidtran <- kidtran[sample(n),] # shuffle row-wise
kidtran.train = kidtran[1:(n * 0.8),]
kidtran.test = kidtran[(n * 0.8):n,]

# create model 
aftmodel <- survreg(kidtransurv~kidtran.train$gender+kidtran.train$race+kidtran.train$age, dist = "weibull")
predicted <- predict(aftmodel, newdata = kidtran.test)

Edit: As mentioned by Hack-R, there was this line of code missing

kidtransurv <- Surv(kidtran.train$time, kidtran.train$delta)

Solution

  • The problem seems to be in your specification of the dependent variable.

    The data and code definition of the dependent was missing from your question, so I can't see what the specific mistake was, but it did not appear to be a proper Surv() survival object (see ?survreg).

    This variation on your code fixes that, makes some minor formatting improvements, and runs fine:

    require(survival)
    pacman::p_load(KMsurv)
    
    library(KMsurv)
    library(survival)
    data("kidtran") 
    
    n = nrow(kidtran)
    
    kidtran       <- kidtran[sample(n),] 
    kidtran.train <- kidtran[1:(n * 0.8),]
    kidtran.test  <- kidtran[(n * 0.8):n,]
    
    # Whatever kidtransurv was supposed to be is missing from your question,
    #   so I will replace it with something not-missing
    #   and I will make it into a proper survival object with Surv()
    
    aftmodel  <- survreg(Surv(time, delta) ~ gender + race + age, dist = "weibull", data = kidtran.train)
    predicted <- predict(aftmodel, newdata = kidtran.test)
    
    
    head(predicted)
    
           302        636        727        121         85        612 
     33190.413  79238.898 111401.546  16792.180   4601.363  17698.895