I have computed a smoothed Kaplan Meier plot in R, by following this example in the comments by @agdamsbo link
I have copy pasted the same code as in the comment here:
library(tidyverse)
library(survival)
library(purrr)
library(ggsurvfit)
## Data
df <- survfit(Surv(time, status) ~ surg, data = ggsurvfit::df_colon) |> ggsurvfit::tidy_survfit(type = "survival")
df_split <- split(df,df$strata)
df_smoothed <- purrr::reduce(lapply(c("estimate","conf.low", "conf.high"), function(j) {
do.call(rbind,
lapply(seq_along(df_split), function(i) {
nms <- names(df_split)[i]
y <-
predict(mgcv::gam(as.formula(paste0(
j[[1]], " ~ s(time, bs = 'cs')"
)), data = df_split[[i]]))
df <- data.frame(df_split[[i]]$time, y, nms)
names(df) <- c("time", paste0(j[[1]], ".smooth"), "strata")
df
}))
}),dplyr::full_join) |> full_join(df)
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`
## Plotting
ggplot(data=df_smoothed) +
geom_line(aes(x=time, y=estimate.smooth, color = strata))+
geom_ribbon(aes(x=time, ymin = conf.low.smooth, ymax = conf.high.smooth, fill = strata), alpha = 0.50)
The above codes works perfect, and I get a pretty smoothed kaplan meier plot, for each of the stratum (I have 4 groups). I can't use step functions as they can reveal micro data. Also, I can't share the smoothed Kaplan Meier plot from the above codes.
However, the curves for each stratum starts slightly above 1 on the y-axis. which does not make sense, as my sample size should be 100% at the beginning. I think it has something to do with the way it is coded due to the smoothing part of the curves.
What I have tried to do to fix the problem:
I have tried to incorporate codes from the cobs package
into the codes from above, "to force" the curves to be flatten at 1 and to start from there. It almost worked for me, however, the KM plot I get only shows one stratum (I have 4 in total). Also, I have added weights in my survfit function, and I don't know how to incorporate the weights when using cobs
.
Here is exactly what I have tried to do to fix it with cobs
in R:
library(tidyverse)
library(survival)
library(purrr)
library(ggsurvfit)
## Data
df <- survfit(Surv(time, outcome) ~ exposure+strata(sex), data = my_data) |> ggsurvfit::tidy_survfit(type = "survival")
df_split <- split(df,df$strata)
df_smoothed <- purrr::reduce(lapply(c("estimate","conf.low", "conf.high"), function(j) {
do.call(rbind,
lapply(seq_along(df_split), function(i) {
nms <- names(df_split)[i]
y <-
predict(mgcv::gam(as.formula(paste0(
j[[1]], " ~ s(time, bs = 'cs')"
)), data = df_split[[i]]))
df <- data.frame(df_split[[i]]$time, y, nms)
names(df) <- c("time", paste0(j[[1]], ".smooth"), "strata")
df
}))
}),dplyr::full_join) |> full_join(df)
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`
library(cobs) ##NEW CODING STARTS FROM HERE
pw <- rbind(c( 1,min(df_smoothed1$time),1),
c(-1,max(df_smoothed$time),0))
x <- df_smoothed$time
y <- df_smoothed$estimate.smooth
ft <- cobs(x,y, constraint="decrease", nknots=4,pointwise= con2,
degree = 2)
fit <- predict(ft, x) [, 'fit']
df_smoothed$x <- x
df_smoothed$y <- y
df_smoothed$fit <- fit
## Plotting
ggplot(data=df_smoothed, aes(x,y, color = strata)) +
geom_line(aes(y=fit))+
geom_ribbon(aes(x=time, ymin = conf.low.smooth, ymax = conf.high.smooth, fill = strata), alpha = 0.50)
Recognizing the above important caveats about whether or not this type of smoothing makes sense for a KM curve, you could do something like the below. You should note however, there is no guarrantee that the smoothed curve is monotonically decreasing - you would need further constraints to meet that requirement.
survfit
object## Data
data = ggsurvfit::df_colon %>% filter(surg=="Limited Time Since Surgery")
# create survfit object
df_surv <- survfit(Surv(time, status) ~ surg, data = data)
# tidy using tidy_survfit
df <- ggsurvfit::tidy_survfit(df_surv, type = "survival")
constrain_0_1_smooth_km <- function(df,y,nknots=30, xinterval=c(0,10)) {
knots <- data.frame(time = seq(xinterval[1],xinterval[2],length=nknots))
sm <- smoothCon(s(time,k=nknots,bs="cr"),df,knots=knots)[[1]]
sm$X[,1] <- 0
sm$S[[1]][1,] <- 0
sm$S[[1]][,1] <- 0
predict(gam(df[[y]] ~ sm$X - 1 + offset(off),paraPen=list(X=list(sm$S[[1]]))))
}
estimate
along with smoothed conf.high
and conf.low
# plot the KM curve
plot(df_surv, conf.int = F)
# add the smoothed version constrained at 0,1
lines(x = df$time, y=constrain_0_1_smooth_km(df, "estimate"),col="red")
lines(x = df$time, y=constrain_0_1_smooth_km(df, "conf.high"),col="blue")
lines(x = df$time, y=constrain_0_1_smooth_km(df, "conf.low"),col="blue")
Output:
You can instead plot with ggplot2
package of course, but I'll leave that to you, as I think the focus of the question is on how to constrain the smooth.