I've got the following data representing a survival function.
# A tibble: 53 x 2
month survival
<int> <dbl>
1 0 1.00
2 1 1.00
3 2 1.00
4 3 1.00
5 4 1.00
6 5 1.00
7 6 0.999
8 7 0.998
9 8 0.997
10 9 0.993
11 10 0.984
12 11 0.976
13 12 0.973
14 13 0.971
15 14 0.969
16 15 0.969
17 16 0.969
18 17 0.969
19 18 0.968
20 19 0.968
21 20 0.968
22 21 0.968
23 22 0.968
24 23 0.968
25 24 0.967
26 25 0.966
27 26 0.966
28 27 0.962
29 28 0.957
30 29 0.952
31 30 0.948
32 31 0.944
33 32 0.942
34 33 0.941
35 34 0.941
36 35 0.941
37 36 0.941
38 37 0.940
39 38 0.939
40 39 0.938
41 40 0.938
42 41 0.938
43 42 0.935
44 43 0.934
45 44 0.930
46 45 0.920
47 46 0.910
48 47 0.895
49 48 0.884
50 49 0.881
51 50 0.879
52 51 0.878
53 52 0.878
I would like to fit a distribution to a survival curve. To do so, first I plot the survival with respect to month. Then I use fitdist
function in order to fit a few distributions.
library('fitdistrplus')
library('flexsurv')
data <- tibble(month = 0:52, survival = c(1, 1, 1, 1, 1, 1, 0.999, 0.998,
0.997, 0.993, 0.984, 0.976, 0.973, 0.971, 0.969, 0.969, 0.969, 0.969, 0.968,
0.968, 0.968, 0.968, 0.968, 0.968,
0.967, 0.966, 0.966, 0.962, 0.957, 0.952, 0.948, 0.944,
0.942, 0.941, 0.941, 0.941, 0.941, 0.940, 0.939, 0.938,
0.938, 0.938, 0.935, 0.934, 0.930, 0.920, 0.910, 0.895,
0.884, 0.881, 0.879, 0.878, 0.878))
data %>% ggplot(aes(month, survival)) + geom_line()
fit_weibull <- fitdist(data[['survival']], 'weibull')
fit_llogis <- fitdist(data[['survival']], "llogis")
fit_log <- fitdist(data[['survival']], "logis")
fit_weibull$aic
fit_llogis$aic
fit_log$aic
According to AIC I should go for the Weibull distribution with a shape = 34.6167936
and scale = 0.9695298
. But I've got a problem with understanding how exactly should I use this distribution to calculate my estimated survival. I was confident that because S(t) = 1 - F(t)
I should just calculate 1 -pweibull(data[['month']], fit_weibull$estimate[['shape']], fit_weibull$estimate[['scale']])
, but it results in following vector:
[1] 1.00000000 0.05399642 0.00000000 0.00000000 0.00000000 0.00000000
0.00000000 0.00000000
[9] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
0.00000000 0.00000000
[17] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
0.00000000 0.00000000
[25] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
0.00000000 0.00000000
[33] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
0.00000000 0.00000000
[41] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
0.00000000 0.00000000
[49] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
So my understanding seems to be awfully wrong. How should I use fit_weibull
to estimate a survival and plot the estimated curve then?
You've got a non-standard version of survival analysis to deal with here. Normally survival analysis data is recorded in terms of discrete events (times at which individuals die) - that's what the flexsurv
package (which you loaded but as far as I can see didn't use) would expect.
Unfortunately fitdistrplus::fitdist
won't work for your data either - that would expect a distribution of survival times. Furthermore, even if you did have data on independent survival times, your data are censored (only 12% of the individuals have died/failed by the end of the time period); I don't know if fitdist
allows for censoring or not.
You probably won't be able to make very strong statistical conclusions about differences among curves, because you don't know (or at least you haven't said) how many independent trials are actually represented by this survival curve - e.g. was the initial cohort made up of 10, 100, or 10^6 individuals ... ?
However, you can fit curves as following:
dat <- data.frame(month = 0:52,
survival = c(1, 1, 1, 1, 1, 1, 0.999, 0.998,
0.997, 0.993, 0.984, 0.976, 0.973, 0.971, 0.969, 0.969, 0.969, 0.969, 0.968,
0.968, 0.968, 0.968, 0.968, 0.968,
0.967, 0.966, 0.966, 0.962, 0.957, 0.952, 0.948, 0.944,
0.942, 0.941, 0.941, 0.941, 0.941, 0.940, 0.939, 0.938,
0.938, 0.938, 0.935, 0.934, 0.930, 0.920, 0.910, 0.895,
0.884, 0.881, 0.879, 0.878, 0.878))
Fit by nonlinear least-squares (not a great statistical model, but adequate). Also: need good starting values.
n1 <- nls(survival~pweibull(month,exp(logshape),exp(logscale),
lower.tail=FALSE),
start=list(logshape=0,logscale=log(20)),data=dat)
n2 <- nls(pmin(survival,0.999)~plogis(month,location,exp(logscale),
lower.tail=FALSE),
start=list(location=40,logscale=log(20)),data=dat)
Plot outcomes:
par(bty="l",las=1)
plot(survival~month,data=dat,type="l")
lines(dat$month,predict(n1),col="red")
lines(dat$month,predict(n2),col="blue")