I am trying to run a Cox proportional hazard model to determine the effect of treatment and covariates on the survival of individual plant species. Previously when I ran coxph
with only the treatment (categorical/factor)
simacox <- coxph(Surv(Time, Event, type = c('right')) ~ Treatment, data = rsima)
It ran fine, but when I added in (continuous) covariates I kept getting an error message:
simacox <- coxph(Surv(Time, Event, type = c('right')) ~
Treatment+SLA+VLA+Thickness+Growth_Rate, data = rsima)
Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge
Here is the data set:
I'm not really sure is if it is caused by NA values or from another issue. I have looked into similar problems, but they generally arise because the Treatment
is continuous and seems to be a different issue.
Plot ID Subplot Treatment Column Row Species Time Event Growth_Rate Area SLA VLA Thickness
PC1 1 control A 7 SIMA 535 1 0.0132 NA NA NA NA
PC1 2 control C 2 SIMA 829 0 0.0532 6 123.5312982 1.307927088 0.1005
PC1 3 control D 2 SIMA 535 1 0.0329 NA NA NA NA
PC2 1 control A 7 SIMA 829 0 0.0236 0.75 192.6132404 1.49602026 0.135
PC2 2 control C 2 SIMA 829 1 0.0037 NA NA NA NA
PC2 3 control D 2 SIMA 535 1 0.0099 NA NA NA NA
PC3 1 control A 7 SIMA 152 1 0.0163 NA NA NA NA
PC3 2 control C 2 SIMA 829 0 0.058 1 185.3606789 1.311713087 0.135
PC3 3 control D 2 SIMA 829 0 0.0097 0.75 96.12967467 1.392643765 0.1735
PC4 1 control A 7 SIMA 152 1 0.0109 NA NA NA NA
PC4 2 control C 2 SIMA 120 1 0.0109 NA NA NA NA
PC4 3 control D 2 SIMA 120 1 0.0217 NA NA NA NA
PC5 1 control A 7 SIMA 92 1 0 NA NA NA NA
PC5 2 control C 2 SIMA 152 1 0.0109 NA NA NA NA
PC5 3 control D 2 SIMA 829 1 0.0009 NA NA NA NA
PS1 1 shelter A 7 SIMA 829 0 0.0121 3.25 96.12967467 1.392643765 0.1735
PS1 2 shelter C 2 SIMA 829 1 0.0009 NA NA NA NA
PS1 3 shelter D 2 SIMA 829 0 0.0435 11.75 119.0672131 1.26393576 0.2495
PS2 1 shelter A 7 SIMA 829 0 0.0508 6 128.8442116 1.744927272 0.1417
PS2 2 shelter C 2 SIMA 829 0 0.0193 1 163.722709 1.987793669 0.1045
PS2 3 shelter D 2 SIMA 829 0 0.0484 6.5 134.4099228 1.589451631 0.18
PS3 1 shelter A 7 SIMA 829 0 0.0363 9.5 184.2795579 1.450538059 0.1035
PS3 2 shelter C 2 SIMA 829 0 0.058 11 96.76593176 1.501929992 0.08
PS3 3 shelter D 2 SIMA 829 0 0.0193 2.25 124.317571 3.516426012 0.1295
PS4 1 shelter A 7 SIMA 829 0 0.0411 4.5 113.088867 2.203327018 0.149
PS4 2 shelter C 2 SIMA 535 1 0.0263 NA NA NA NA
PS4 3 shelter D 2 SIMA 829 0 0.058 11 31.44098888 1.714225616 0.1595
PS5 1 shelter A 7 SIMA 829 0 0.0363 11.5 155.3209302 1.308096836 0.23875
PS5 2 shelter C 2 SIMA 829 0 0.0048 0.25 171.0465116 2.135961931 0.104
PS5 3 shelter D 2 SIMA 829 0 0.0266 5 178.9407945 1.599492384 0.0975
PW1 1 watered A 7 SIMA 829 1 0.0056 NA NA NA NA
PW1 2 watered C 2 SIMA 829 0 0.0484 6.5 150.7782165 1.956811087 0.159
PW1 3 watered D 2 SIMA 829 0 0.0181 3 158.1184404 1.94474398 0.1935
PW2 1 watered A 7 SIMA 829 0 0.0351 8.5 148.9020752 1.482003075 0.2405
PW2 2 watered C 2 SIMA 829 0 0.0508 1.5 170.3944295 1.653449107 0.127
PW2 3 watered D 2 SIMA 829 1 0.0009 NA NA NA NA
PW3 1 watered A 7 SIMA 829 0 0.0073 1 159.8682043 1.594187964 0.224
PW3 2 watered C 2 SIMA 120 1 0.0217 NA NA NA NA
PW3 3 watered D 2 SIMA 829 0 0.0919 25 146.6362786 1.694286556 0.1325
PW4 1 watered A 7 SIMA 120 1 0.0109 NA NA NA NA
PW4 2 watered C 2 SIMA 829 1 0.0009 NA NA NA NA
PW4 3 watered D 2 SIMA 152 1 0.0163 NA NA NA NA
PW5 1 watered A 7 SIMA 829 1 0.0009 NA NA NA NA
PW5 2 watered C 2 SIMA 535 1 0.0266 1.5 162.8057554 2.065105317 0.94
PW5 3 watered D 2 SIMA 829 0 0.058 4 80.37696758 1.831219479 0.1195
The problem is actually with Thickness
; it's easy to verify that
fit <- coxph(Surv(Time, Event) ~ Thickness, data = rsima)
produces the warning
Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge
We can get some insight into convergence issues from ?coxph
:
In certain data cases the actual MLE estimate of a coefficient is infinity, e.g., a dichotomous variable where one of the groups has no events. When this happens the associated coefficient grows at a steady pace and a race condition will exist in the fitting routine: either the log likelihood converges, the information matrix becomes effectively singular, an argument to exp becomes too large for the computer hardware, or the maximum number of interactions is exceeded. (Nearly always the first occurs.) The routine attempts to detect when this has happened, not always successfully. The primary consequence for he user is that the Wald statistic = coefficient/se(coefficient) is not valid in this case and should be ignored; the likelihood ratio and score tests remain valid however.
If we take a look at rsima$Thickness
we notice that most values are small (in the range 0.08 <= Thickness <= 0.2495
) with one single value being Thickness = 0.94
. This is very similar to the case described in the documentation, where Thickness
is basically a discrete variable (with levels "low" and "high") and one group having nearly no events (the "high" group has only one event).
Based on this post on Cross Validated, it's useful to visualise the effect by plotting
library(survminer)
ggsurvplot(survfit(Surv(Time, Event) ~ (Thickness > median(Thickness, na.rm = T)), data = df), data = df)
What we're doing here is plotting the survival probability as a function of a dichotomised Thickness
, with Thickness
being either smaller than its median value (the red curve), or larger (the blue curve).
You can see the effect of Thickness
on the survival probability, or rather, the absence of an effect of Thickness
. For example, notice how there are no Event = 1
cases for small Thickness
values, and there is only one Event = 1
case for large Thickness
values.
In terms of fitting the model, it is impossible to obtain a robust estimate of the Thickness
effect on the survival probability, and Thickness
should be removed from the model prior to exploring additional continuous/discrete covariates.