Search code examples
rrandom-forestsurvival

Do results of survival analysis only pertain to the observations analyzed?


enter image description here

Hey guys, so I taught myself time-to-event analysis recently and I need some help understanding it. I made some Kaplan-Meier survival curves.

Sure, the number of observations within each node is small but let's pretend that I have plenty.

K <- HF %>% 
  filter(serum_creatinine <= 1.8, ejection_fraction <= 25)


## Call: survfit(formula = Surv(time, DEATH_EVENT) ~ 1, data = K)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    20     36       5    0.881  0.0500        0.788        0.985
##    45     33       3    0.808  0.0612        0.696        0.937
##    60     31       3    0.734  0.0688        0.611        0.882
##    80     23       6    0.587  0.0768        0.454        0.759
##   100     17       1    0.562  0.0776        0.429        0.736
##   110     17       0    0.562  0.0776        0.429        0.736
##   120     16       1    0.529  0.0798        0.393        0.711
##   130     14       0    0.529  0.0798        0.393        0.711
##   140     14       0    0.529  0.0798        0.393        0.711
##   150     13       1    0.488  0.0834        0.349        0.682

If someone were to ask me about the third node, would the following statements be valid?:

For any new patient that walks into this hospital with <= 1.8 in Serum_Creatine & <= 25 in Ejection Fraction, their probability of survival is 53% after 140 days.

What about:

The survival distributions for the samples analyzed, and no other future incoming samples, are visualized above.

I want to make sure these statements are correct. I would also like to know if logistic regression could be used to predict the binary variable DEATH_EVENT? Since the TIME variable contributes to how much weight one patient's death at 20 days has over another patient's death at 175 days, I understand that this needs to be accounted for.

If logistic regression can be used, does that imply anything over keeping/removing variable TIME?


Solution

  • Here are some thoughts:

    Logistic regression is not appropriate in your case. As it is not the correct method for time to event analysis.

    • If the clinical outcome observed is “either-or,” such as if a patient suffers an MI or not, logistic regression can be used.

    • However, if the information on the time to MI is the observed outcome, data are analyzed using statistical methods for survival analysis.

    Text from here

    If you want to use a regression model in survival analysis then you should use a COX PROPORTIONAL HAZARDS MODEL. To understand the difference of a Kaplan-Meier analysis and Cox proportional hazards model you should understand both of them.

    The next step would be to understand what is a univariable in contrast to a multivariable Cox proportional hazard model.

    At the end you should understand all 3 methods(Kaplan-Meier, Cox univariable and Cox multivariable) then you can answer your question if this is a valid statement:

    • For any new patient that walks into this hospital with <= 1.8 in Serum_Creatine & <= 25 in Ejection Fraction, their probability of survival is 53% after 140 days.

    There is nothing wrong to state the results of a subgroup of a Kaplan-Meier method. But it has a different value if the statement comes from a multivariable Cox regression analysis.