I am using an accelerated failure time / AFT model with a weibull distribution to predict data. I am doing this using the survival package in R. I am splitting my data in training and test, do training on the training set and afterwards try to predict the values for the test set. To do that I am passing the the test set as the newdata
parameter, as stated in the references. I get an error, saying that newdata
does not have the same size as the training data (obviously!). Then the function seems to evaluate predict the values for the training set.
How can I predict the values for the new data?
# get data
n = nrow(kidtran)
kidtran <- kidtran[sample(n),] # shuffle row-wise
kidtran.train = kidtran[1:(n * 0.8),]
kidtran.test = kidtran[(n * 0.8):n,]
# create model
aftmodel <- survreg(kidtransurv~kidtran.train$gender+kidtran.train$race+kidtran.train$age, dist = "weibull")
predicted <- predict(aftmodel, newdata = kidtran.test)
Edit: As mentioned by Hack-R, there was this line of code missing
kidtransurv <- Surv(kidtran.train$time, kidtran.train$delta)
The problem seems to be in your specification of the dependent variable.
The data and code definition of the dependent was missing from your question, so I can't see what the specific mistake was, but it did not appear to be a proper Surv()
survival object (see ?survreg
This variation on your code fixes that, makes some minor formatting improvements, and runs fine:
n = nrow(kidtran)
kidtran <- kidtran[sample(n),]
kidtran.train <- kidtran[1:(n * 0.8),]
kidtran.test <- kidtran[(n * 0.8):n,]
# Whatever kidtransurv was supposed to be is missing from your question,
# so I will replace it with something not-missing
# and I will make it into a proper survival object with Surv()
aftmodel <- survreg(Surv(time, delta) ~ gender + race + age, dist = "weibull", data = kidtran.train)
predicted <- predict(aftmodel, newdata = kidtran.test)
302 636 727 121 85 612 33190.413 79238.898 111401.546 16792.180 4601.363 17698.895