I am new to using R, and I am trying to use survival analysis in order to find correlation in censored data. The x data is the envelope mass of protostars. The y data is the intensity of an observed molecular line, and some values are upper limits. The data is:
x <- c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018)
y <- c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42)
censor <- c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1)
I am using the function survreg from the R Survival library
modeldata<-survreg(formula=Surv(y,censor)~x, dist="exponential", control = list(maxiter=90))
Which gives the following result:
summary(modeldata)
Call:
survreg(formula = Surv(y, censor) ~ x, dist = "exponential",
control = list(maxiter = 90))
Value Std. Error z p
(Intercept) -0.114 0.568 -0.20 0.841
x 0.153 0.110 1.39 0.163
Scale fixed at 1
Exponential distribution
Loglik(model)= -6.9 Loglik(intercept only)= -9
Chisq= 4.21 on 1 degrees of freedom, p= 0.04
Number of Newton-Raphson Iterations: 5
n= 11
However, when I plot the data and the model using the following method:
plot(x,y,pch=(censor+1))
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red")
I get this plot of x and y data; triangles are censored data
I am not sure where I am going wrong. I have tried different distributions, but all produce similar results. The same is true when I use other data, for example:
x <- c(1.14, 1.14, 1.19, 0.78, 0.43, 0.24, 0.19, 0.16, 0.17, 0.66, 0.40)
I am also not sure if I am interpreting the results correctly.
I have tried other examples using the same method (e.g. https://stats.idre.ucla.edu/r/examples/asa/r-applied-survival-analysis-ch-1/), and it works well, as far as I can tell.
So my questions are:
Am I using the correct function for fitting the data? If not, which would be more suitable?
If it is the correct function, why is the model not fitting the data even closely? Does it have to do with the plotting?
Thank you for your help.
The "shape" of the relationship looks concave downward, so I would have guessed a ~ log(x)
fit might be be more appropriate:
dfrm <- data.frame( x = c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018),
y = c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42),
censor= c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1))
modeldata<-survreg(formula=Surv(y,censor)~log(x), data=dfrm, dist="loggaussian", control = list(maxiter=90))
You code seemed appropriate:
png(); plot(y~x,pch=(censor+1),data=dfrm)
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red"); dev.off()
modeldata
Call:
survreg(formula = Surv(y, censor) ~ log(x), data = dfrm, dist = "loggaussian",
control = list(maxiter = 90))
Coefficients:
(Intercept) log(x)
0.02092589 0.32536509
Scale= 0.7861798
Loglik(model)= -6.6 Loglik(intercept only)= -8.8
Chisq= 4.31 on 1 degrees of freedom, p= 0.038
n= 11