Search code examples
rsurvival-analysis

Perform function on each level of multiple columns in R


I need to perform a function on each level of multiple columns in a data.table. For example, using the lung dataset from survival:

library(survival)
library(data.table)
library(dplyr)

data(lung)
setDT(lung)

vars <- c("sex", "ph.ecog")
lung[, (vars) := lapply(.SD, factor), .SDcols = vars]

fit <- tibble()
for (i in levels(lung[, vars ])){
temp <-
coxph(
  Surv(time, status) ~ i,
  data = lung
) %>% 
broom::tidy(exp=T)
fit <- bind_rows(fit, temp)
  }

This is not working - how can I succeed?


Solution

  • Do you want to run the function for each level of vars column or for each vars column?

    For the later, you can do :

    do.call(rbind,lapply(vars, function(x) {
      broom::tidy(coxph(reformulate(x, 'Surv(time, status)'), data = lung))
    }))
    
    #  term     estimate std.error statistic   p.value conf.low conf.high
    #  <chr>       <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    #1 sex2       -0.531     0.167     -3.18 0.00149    -0.859     -0.203
    #2 ph.ecog1    0.369     0.199      1.86 0.0634     -0.0205     0.758
    #3 ph.ecog2    0.916     0.225      4.08 0.0000448   0.476      1.36 
    #4 ph.ecog3    2.21      1.03       2.15 0.0314      0.197      4.22 
    

    To simplify a bit since you are already using data.table, you can use rbindlist instead of do.call + rbind.

    To run this for levels in your data you can do :

    do.call(rbind, lapply(vars, function(x) do.call(rbind,
            lapply(levels(lung[[x]]), function(y) 
        broom::tidy(coxph(reformulate(x, 'Surv(time, status)'), 
                data = lung[lung[[x]] == y]))))))