Search code examples
rsurvival

Creating a function passing its arguments to the Surv function (or any other)


Please consider the following:

To create a survival curve one can make use of the survfit function of the survival package.

My aim is to write a function that (amongst other things) creates such a curve, but the function should work with different data.frames whose column names also differ. Also the grouping variable will be dependant on the respective dataset.

I manage to pass different data.frame names to the function but providing the column name for the survfit and Surv function does not work for me.

Any help is greatly appreciated.

In my opinion this is a different problem from simply passing a data.frame column name to a function as discussed here: Pass a data.frame column name to a function

# required libraries
library(survival)
library(flexsurv)

#### Examples that work without own function ===================================
# survfit wit lung data
survfit(Surv(time = time, event = status) ~ 1, data = lung)
#> Call: survfit(formula = Surv(time = time, event = status) ~ 1, data = lung)
#> 
#>       n  events  median 0.95LCL 0.95UCL 
#>     228     165     310     285     363
survfit(Surv(time = time, event = status) ~ sex, data = lung)
#> Call: survfit(formula = Surv(time = time, event = status) ~ sex, data = lung)
#> 
#>         n events median 0.95LCL 0.95UCL
#> sex=1 138    112    270     212     310
#> sex=2  90     53    426     348     550

# survfit with bc data
survfit(Surv(time = rectime, event = censrec) ~ 1, data = bc)
#> Call: survfit(formula = Surv(time = rectime, event = censrec) ~ 1, 
#>     data = bc)
#> 
#>       n  events  median 0.95LCL 0.95UCL 
#>     686     299    1807    1587    2030

# Create variable function that takes on data specific arguments
SurvFun <- function(fun.time, fun.event, grouping = 1, fun.dat){
  survfit(Surv(time = fun.time, event = fun.event) ~ grouping, data = fun.dat)
}

#### Own function that doesn't work ============================================
# This should work for data = lung
SurvFun(fun.time = time, fun.event = status, grouping = 1, fun.dat = lung)
#> Error in Surv(time = fun.time, event = fun.event): Time variable is not numeric

Created on 2018-07-05 by the reprex package (v0.2.0).


Solution

  • When column names aren't surrounded by quotes, they are being passed as symbols. It's much harder to pass around symbols than simple variables. This applies to formulas as well. You need to do some meta-programming for that to work. Here's one way to re-write your function to work

    SurvFun <- function(fun.time, fun.event, grouping = 1, fun.dat) {
      params <- list(fun.time = substitute(fun.time),
        fun.event = substitute(fun.event),
        grouping = substitute(grouping), 
        fun.dat = substitute(fun.data))
      expr <- substitute(survfit(Surv(time = fun.time, event = fun.event) ~ grouping, 
        data = fun.dat), params)
      eval.parent(expr)
    }