Search code examples
rggplot2curve-fittingsmoothing

kaplan meier curve starting above 1 on the y-axis


I have computed a smoothed Kaplan Meier plot in R, by following this example in the comments by @agdamsbo link

I have copy pasted the same code as in the comment here:

library(tidyverse)
library(survival)
library(purrr)
library(ggsurvfit)

## Data
df <- survfit(Surv(time, status) ~ surg, data = ggsurvfit::df_colon) |> ggsurvfit::tidy_survfit(type = "survival")

df_split <- split(df,df$strata)

df_smoothed <- purrr::reduce(lapply(c("estimate","conf.low", "conf.high"), function(j) {
  do.call(rbind,
          lapply(seq_along(df_split), function(i) {
            nms <- names(df_split)[i]
            y <-
              predict(mgcv::gam(as.formula(paste0(
                j[[1]], " ~ s(time, bs = 'cs')"
              )), data = df_split[[i]]))
            df <- data.frame(df_split[[i]]$time, y, nms)
            names(df) <- c("time", paste0(j[[1]], ".smooth"), "strata")
            df
          }))
}),dplyr::full_join) |> full_join(df)
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`

## Plotting
ggplot(data=df_smoothed) + 
  geom_line(aes(x=time, y=estimate.smooth, color = strata))+
  geom_ribbon(aes(x=time, ymin = conf.low.smooth, ymax = conf.high.smooth, fill = strata), alpha = 0.50)

The above codes works perfect, and I get a pretty smoothed kaplan meier plot, for each of the stratum (I have 4 groups). I can't use step functions as they can reveal micro data. Also, I can't share the smoothed Kaplan Meier plot from the above codes.

However, the curves for each stratum starts slightly above 1 on the y-axis. which does not make sense, as my sample size should be 100% at the beginning. I think it has something to do with the way it is coded due to the smoothing part of the curves.

What I have tried to do to fix the problem: I have tried to incorporate codes from the cobs package into the codes from above, "to force" the curves to be flatten at 1 and to start from there. It almost worked for me, however, the KM plot I get only shows one stratum (I have 4 in total). Also, I have added weights in my survfit function, and I don't know how to incorporate the weights when using cobs.

Here is exactly what I have tried to do to fix it with cobs in R:

library(tidyverse)
library(survival)
library(purrr)
library(ggsurvfit)

## Data
df <- survfit(Surv(time, outcome) ~ exposure+strata(sex), data = my_data) |> ggsurvfit::tidy_survfit(type = "survival")

df_split <- split(df,df$strata)

df_smoothed <- purrr::reduce(lapply(c("estimate","conf.low", "conf.high"), function(j) {
  do.call(rbind,
          lapply(seq_along(df_split), function(i) {
            nms <- names(df_split)[i]
            y <-
              predict(mgcv::gam(as.formula(paste0(
                j[[1]], " ~ s(time, bs = 'cs')"
              )), data = df_split[[i]]))
            df <- data.frame(df_split[[i]]$time, y, nms)
            names(df) <- c("time", paste0(j[[1]], ".smooth"), "strata")
            df
          }))
}),dplyr::full_join) |> full_join(df)
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`
#> Joining with `by = join_by(time, strata)`


library(cobs) ##NEW CODING STARTS FROM HERE

pw <- rbind(c( 1,min(df_smoothed1$time),1), 
          c(-1,max(df_smoothed$time),0)) 
x <- df_smoothed$time
y <- df_smoothed$estimate.smooth
ft <- cobs(x,y, constraint="decrease", nknots=4,pointwise= con2,
        degree = 2)
fit <- predict(ft, x) [, 'fit']

df_smoothed$x <- x
df_smoothed$y <- y
df_smoothed$fit <- fit

## Plotting
ggplot(data=df_smoothed, aes(x,y, color = strata)) + 
  geom_line(aes(y=fit))+
  geom_ribbon(aes(x=time, ymin = conf.low.smooth, ymax = conf.high.smooth, fill = strata), alpha = 0.50)

Solution

  • Recognizing the above important caveats about whether or not this type of smoothing makes sense for a KM curve, you could do something like the below. You should note however, there is no guarrantee that the smoothed curve is monotonically decreasing - you would need further constraints to meet that requirement.

    1. Data Prep

    • get dataset
    • create survfit object
    • create data frame of tidied values
    ## Data
    data = ggsurvfit::df_colon %>% filter(surg=="Limited Time Since Surgery")
    
    # create survfit object
    df_surv <- survfit(Surv(time, status) ~ surg, data = data)
    
    # tidy using tidy_survfit
    df <- ggsurvfit::tidy_survfit(df_surv, type = "survival")
    

    2. Create function that constrains a smooth to go through (0,1), and provides prediction of that smooth

    constrain_0_1_smooth_km <- function(df,y,nknots=30, xinterval=c(0,10)) {
      knots <- data.frame(time = seq(xinterval[1],xinterval[2],length=nknots))
      sm <- smoothCon(s(time,k=nknots,bs="cr"),df,knots=knots)[[1]]
      sm$X[,1] <- 0
      sm$S[[1]][1,] <- 0
      sm$S[[1]][,1] <- 0
      predict(gam(df[[y]] ~ sm$X - 1 + offset(off),paraPen=list(X=list(sm$S[[1]]))))
    }
    

    3. Plot the original km, and the smoothed estimate along with smoothed conf.high and conf.low

    # plot the KM curve
    plot(df_surv, conf.int = F)
    # add the smoothed version constrained at 0,1
    lines(x = df$time, y=constrain_0_1_smooth_km(df, "estimate"),col="red")
    lines(x = df$time, y=constrain_0_1_smooth_km(df, "conf.high"),col="blue")
    lines(x = df$time, y=constrain_0_1_smooth_km(df, "conf.low"),col="blue")
    

    Output:

    enter image description here

    You can instead plot with ggplot2 package of course, but I'll leave that to you, as I think the focus of the question is on how to constrain the smooth.