Search code examples
rsurvival-analysis

Survival Times for Kaplan Meier Curve


Kaplan Meier I am working in R markdown

I would like to know if I am calculating the survival time correct. My data is MOWHTO_COMPLICATIONS

My data has the following variables D_SURGERY (Date of surgery), REV_ARTHROPLASTY (Date of revision), SENSOR_STATUS ( Which is either 0 = censored or 1 = revised).

The date of revision has dates for the revised cases only. The remaining cells are empty. enter image description here

I calculated the survival time in years using the following code:

MOWHTO_COMPLICATIONS$SURVIVAL_TIME_YEARS = as.numeric(difftime(MOWHTO_COMPLICATIONS$REV_ARTHROPLASTY, MOWHTO_COMPLICATIONS$D_SURGERY, units = "weeks"))/52.25

Then created the curve using the following code:

survfit2(Surv(SURVIVAL_TIME_YEARS, CENSOR_STATUS) ~ 1, data = MOWHTO_COMPLICATIONS) %>% 
  ggsurvfit() +
  labs(
    x = "Years",
    y = "Overall Survival Probability"
  )+ 
  add_confidence_interval()+
  add_risktable()

Then I want to see the survival at 10 years and used the following code:

summary(survfit(Surv(SURVIVAL_TIME_YEARS, CENSOR_STATUS) ~ 1, data = MOWHTO_COMPLICATIONS), times = 10)

And I had the following results Call: survfit(formula = Surv(SURVIVAL_TIME_YEARS, CENSOR_STATUS) ~ 1, data = MOWHTO_COMPLICATIONS)

678 observations deleted due to missingness time n.risk n.event survival std.err lower 95% CI 10 8 55 0.127 0.0419 0.0665 upper 95% CI 0.243

So, the survival is 0.127 that is 13% at 10 years. It can not be correct. It should be over 80 or 90%?

What am I doing wrong? Is it the survival time? Should I have dates for the cases that is not revised? And what dates should be?

Any help would be very much appreciated.


Solution

  • The empty columns need to have a date in it or when you take the difftime, the result will be empty.

    For example:

    as.numeric(difftime(ISOdate(2001, 4, 26), NA, units = "weeks"))/52.25
    

    returns NA

    Instead fill in the last date the subject was observed when the event of interest did not occur. That way when you subtract the dates, you will get a value for time.

    There is a hint in this part of the output:

    678 observations deleted due to missingness time

    They deleted those records because they didn't have a time value.