Search code examples
rsurvminer

Why does the surv_pvalue (or any other) function not work inside a custom function in R?


To generate p-values for the differences between two survival curves, I need to perform a tedious series of steps, which need to be repeated with all combinations of two out of three values that are represented in one column of the starting data frame, say groups A, B and C:

# Load packages
library(survival)
library(survminer)

# Generate practice dataset
surv_time <- sample(100:200, size = 50, replace = TRUE)
surv_group <- sample(c("A", "B", "C"), size = 50, replace = TRUE)
surv_status <- rep(1, 50)
surv_data <- data.frame(Time = as.numeric(surv_time), Group = surv_group,
                        Status = surv_status)

# Manual for combination of groups A and B
s_data_AB <- surv_data[surv_data$Group %in% c("A", "B"), ]
s_obj_AB <- Surv(s_data_AB$Time, s_data_AB$Status)
s_curve_AB <- survfit(s_obj_AB ~ Group, data = s_data_AB)
s_pval_AB <- surv_pvalue(s_curve_AB)
s_pval_AB

This works perfectly fine, but it is inconvenient to do this separately for (A+C) and (B+C). So I tried to automate this with the following custom function:

# Custom function
s_curve_fun <- function(groups_of_interest) {
  s_data <- surv_data[surv_data$Group %in% groups_of_interest, ]
  s_obj <- Surv(s_data$Time, s_data$Status)
  s_curve <- survfit(s_obj ~ Group, data = s_data)
  print(s_curve)
  s_pval <- surv_pvalue(s_curve)
  return(s_pval)
}

# Execute function
s_curve_fun(c("A", "B"))

This throws an error after the print call works, as calling the surv_pvalue function at the end throws the following error:

Error in eval(fit$call$data) : object 's_data' not found

Interestingly, calling the function survfit right beforehand works fine, even though survfit also uses the object s_data. When I export the output of survfit from s_curve_fun and use it as input for surv_pvalue, I get the same error. When I define s_data outside of s_curve_fun, everything works (but obviously not for the desired Group combination).

I am very curious to understand why this does not work and how to fix it. I'd appreciate any feedback!


Solution

  • I think the problem is that you are not calling survfit in the manner it was intended. The Surv-object should have the column names, but not the dataset name in it.

    You also need to provide survminer::surv_pvalue a dataset as a second parameter

    Try (works for me):

    s_curve_fun <- function(groups_of_interest) {
        s_data <- surv_data[surv_data$Group %in% groups_of_interest, ]
        # s_obj <- Surv(s_data$Time, s_data$Status)  
        # Not good to build Surv object outside survfit call
        # also not good to use `$data`
        s_curve <- survfit( Surv(Time, Status) ~ Group, data = s_data)
        print(s_curve)
        # need a data object to give to surv_pvalue
        s_pval <- surv_pvalue(s_curve, s_data)
        return(s_pval)
       }
    

    Using construction of Surv-objects outside of the survfit call and giving it arguments using $-extraction from data objects that are outside the function screws up where survfit and its associated functions expect to find their environment contents and do evaluations. I'm not sure whether that widespread but error-prone practice of creating Surv-objects outside the calling environment of survfit comes from. Looking at the code for surv_pvalue, I'm wondering if that might be a problem in that calling framework as well.