Search code examples
rregressionsurvival-analysis

R survival survreg not producing a good fit


I am new to using R, and I am trying to use survival analysis in order to find correlation in censored data. The x data is the envelope mass of protostars. The y data is the intensity of an observed molecular line, and some values are upper limits. The data is:

x <- c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018)
y <- c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42)
censor <- c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1)

I am using the function survreg from the R Survival library

modeldata<-survreg(formula=Surv(y,censor)~x, dist="exponential", control = list(maxiter=90))

Which gives the following result:

summary(modeldata)

Call:
survreg(formula = Surv(y, censor) ~ x, dist = "exponential", 
control = list(maxiter = 90))
Value Std. Error     z     p
(Intercept) -0.114      0.568 -0.20 0.841
x            0.153      0.110  1.39 0.163

Scale fixed at 1 

Exponential distribution
Loglik(model)= -6.9   Loglik(intercept only)= -9
Chisq= 4.21 on 1 degrees of freedom, p= 0.04 
Number of Newton-Raphson Iterations: 5 
n= 11

However, when I plot the data and the model using the following method:

plot(x,y,pch=(censor+1))
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red")

I get this plot of x and y data; triangles are censored data

I am not sure where I am going wrong. I have tried different distributions, but all produce similar results. The same is true when I use other data, for example:

x <- c(1.14, 1.14, 1.19, 0.78, 0.43, 0.24, 0.19, 0.16, 0.17, 0.66, 0.40)

I am also not sure if I am interpreting the results correctly.

I have tried other examples using the same method (e.g. https://stats.idre.ucla.edu/r/examples/asa/r-applied-survival-analysis-ch-1/), and it works well, as far as I can tell.

So my questions are:

  1. Am I using the correct function for fitting the data? If not, which would be more suitable?

  2. If it is the correct function, why is the model not fitting the data even closely? Does it have to do with the plotting?

Thank you for your help.


Solution

  • The "shape" of the relationship looks concave downward, so I would have guessed a ~ log(x) fit might be be more appropriate:

    dfrm <- data.frame( x = c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018),
    y = c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42),
    censor= c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1))
    
    modeldata<-survreg(formula=Surv(y,censor)~log(x), data=dfrm, dist="loggaussian", control = list(maxiter=90))
    

    You code seemed appropriate:

    png(); plot(y~x,pch=(censor+1),data=dfrm)
    xnew<-seq(0,30)
    model<-predict(modeldata,list(x=xnew))
    lines(xnew,model,col="red"); dev.off()
    

    enter image description here

    modeldata
    Call:
    survreg(formula = Surv(y, censor) ~ log(x), data = dfrm, dist = "loggaussian", 
        control = list(maxiter = 90))
    
    Coefficients:
    (Intercept)      log(x) 
     0.02092589  0.32536509 
    
    Scale= 0.7861798 
    
    Loglik(model)= -6.6   Loglik(intercept only)= -8.8
        Chisq= 4.31 on 1 degrees of freedom, p= 0.038 
    n= 11