Do results of survival analysis only pertain to the observations analyzed?

Hey guys, so I taught myself time-to-event analysis recently and I need some help understanding it. I made some Kaplan-Meier survival curves.

Sure, the number of observations within each node is small but let's pretend that I have plenty.

K <- HF %>% 
  filter(serum_creatinine <= 1.8, ejection_fraction <= 25)


## Call: survfit(formula = Surv(time, DEATH_EVENT) ~ 1, data = K)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    20     36       5    0.881  0.0500        0.788        0.985
##    45     33       3    0.808  0.0612        0.696        0.937
##    60     31       3    0.734  0.0688        0.611        0.882
##    80     23       6    0.587  0.0768        0.454        0.759
##   100     17       1    0.562  0.0776        0.429        0.736
##   110     17       0    0.562  0.0776        0.429        0.736
##   120     16       1    0.529  0.0798        0.393        0.711
##   130     14       0    0.529  0.0798        0.393        0.711
##   140     14       0    0.529  0.0798        0.393        0.711
##   150     13       1    0.488  0.0834        0.349        0.682

If someone were to ask me about the third node, would the following statements be valid?:

For any new patient that walks into this hospital with <= 1.8 in Serum_Creatine & <= 25 in Ejection Fraction, their probability of survival is 53% after 140 days.

What about:

The survival distributions for the samples analyzed, and no other future incoming samples, are visualized above.

I want to make sure these statements are correct. I would also like to know if logistic regression could be used to predict the binary variable DEATH_EVENT? Since the TIME variable contributes to how much weight one patient's death at 20 days has over another patient's death at 175 days, I understand that this needs to be accounted for.

If logistic regression can be used, does that imply anything over keeping/removing variable TIME?

Solution

Here are some thoughts:

Logistic regression is not appropriate in your case. As it is not the correct method for time to event analysis.

If the clinical outcome observed is “either-or,” such as if a patient suffers an MI or not, logistic regression can be used.
However, if the information on the time to MI is the observed outcome, data are analyzed using statistical methods for survival analysis.

Text from here

If you want to use a regression model in survival analysis then you should use a COX PROPORTIONAL HAZARDS MODEL. To understand the difference of a Kaplan-Meier analysis and Cox proportional hazards model you should understand both of them.

The next step would be to understand what is a univariable in contrast to a multivariable Cox proportional hazard model.

At the end you should understand all 3 methods(Kaplan-Meier, Cox univariable and Cox multivariable) then you can answer your question if this is a valid statement:

For any new patient that walks into this hospital with <= 1.8 in Serum_Creatine & <= 25 in Ejection Fraction, their probability of survival is 53% after 140 days.

There is nothing wrong to state the results of a subgroup of a Kaplan-Meier method. But it has a different value if the statement comes from a multivariable Cox regression analysis.