I am trying to understand what I am seeing from the output of coxph
in R.
In the following replicatiable code:
`library(survival)
library(ggfortify)
#Kaplan-Meier
kmfit = survfit(Surv(time, status)~sex, data=lung)
autoplot(kmfit, conf.int = F) #two lines, one for each gender
`
`#coxph
coxfit = survfit(coxph(Surv(time, status)~sex, data=lung))
autoplot(coxfit)
`
Why do I only see 1 line? Should I be seeing two lines? How can I get 2 lines?
The answer is included in the documentation ?survfit.coxph
If the newdata argument is missing, then a curve is produced for a single "pseudo" subject with covariate values equal to the means of the data set. The resulting curve(s) almost never make sense, but The default remains due to an unwarranted attachment to the option shown by some users and by other packages. Two particularly egregious examples are factor variables and interactions. Suppose one were studying interspecies transmission of a virus, and the data set has a factor variable with levels ("pig", "chicken") and about equal numbers of observations for each. The “mean” covariate level will be 0.5 – is this a flying pig? As to interactions assume data with sex coded as 0/1, ages ranging from 50 to 80, and a model with age*sex. The “mean” value for the age:sex interaction term will be about 30, a value that does not occur in the data. Users are strongly advised to use the newdata argument. For these reasons predictions from a multistate coxph model require the newdata argument.
The single curve you are seeing corresponds to the estimated survival of an individual whose covariates are the mean of the sample. This is almost certainly not what you want. What you should do is define and supply a newdata=
option to survfit()
. In that case each row of newdata
will define a single curve.
library(survival)
newdat = data.frame(sex = c(1,2))
coxfit2 = survfit(coxph(Surv(time, status)~sex, data=lung),
newdata = newdat)
plot(coxfit2)
Created on 2023-02-23 with reprex v2.0.2
However, it seems like ggfortify
has a problem when supplying newdata=
. I'm not sure if it's a bug or if you need to supply the data in some other format. You might need to create the plot yourself or ask the package author.