Search code examples
rplotsurvival-analysisweibull

Plotting a simple survreg Weibul survivall fit


This is a simpler variation of the question that has been answered at How to plot the survival curve generated by survreg (package survival of R)?

# Create simple Weibull survival fit using library(survival)
  surmo<-survreg( Surv(validtimes, status)~1, dist="weibull")

# Getting Kaplan-Meier
  fKM<-survfit( Surv(validtimes, status)~1)

# Plot Kaplan-Meier
  plot(fKM,xlab="Time,Days",conf.int=TRUE,mark.time=TRUE,ylab="Fraction",main="Kaplan-Meier Plot")

Everything up to this worked fine without any issues.

Problems arose when I wanted to overlay the predicted Weibull fit on the data. Based on the example I used.

 pct <- seq(.01,.99,by=.01)
 maxvalidtimes<-max(validtimes)
# Getting the Weibull lines to overlay
  lines(predict(surmo,newdata=list(1:maxvalidtimes),type="quantile",p=pct),1-pct,col="red")

I get an error

Error in xy.coords(x, y) : 'x' and 'y' lengths differ

I assumed the problem was from the term: newdata=list(1:maxvalidtimes)

I tried to delete the newdata term and also setting newdata=list(1:99) to no avail.

I tried the same thing in the flexsurv package and I got the exact plots I wanted, with little effort.

 # Using flexsurv package here
  surmof  <- flexsurvreg( Surv(validtimes, status)~1,dist='weibull')
  plot(surmof,mark.time=TRUE,xlab="Time,Days",ylab="Fraction",main="FlexSurv Plot")

Solution

  • Since you didn't offer any data, I'm going to modify the last example in the ?predict.survreg page which uses the lung dataset. You don't need any newdata since you only want a quantile-type plot and that requires a vector argument given to p.

    lfit <- survreg(Surv(time, status) ~ 1, data=lung)
    pct <- 1:98/100   # The 100th percentile of predicted survival is at +infinity
    ptime <- predict(lfit,  type='quantile',
                      p=pct, se=TRUE)
     str(ptime)
    #------------
    List of 2
     $ fit   : num [1:228, 1:98] 12.7 12.7 12.7 12.7 12.7 ...
     $ se.fit: num [1:228, 1:98] 2.89 2.89 2.89 2.89 2.89 ...
    

    So you actually have way too many data point and if you look at the 228 lines of data in ptime you will find that every row is the same, so just use the first row.

    identical( ptime$fit[1,], ptime$fit[2,])
    #[1] TRUE
    
    
     str(ptime$fit[1,])
    # num [1:98] 12.7 21.6 29.5 36.8 43.8 ...
    

    So you have a predicted time for every quantile and remember that a survival function is just 1 minus the quantile function and that the y-values are the given quentiles while its the times that form the x-values:

     plot(x=ptime$fit[1,], y=1-pct, type="l")
    

    enter image description here