Search code examples
rsurvival-analysistidymodels

`Augment()`ing `coxph()` output in a tidymodels workflow?


I'm trying to find an efficient way to create a dataset that contains the original data, coefficient estimates, model fit, and fitted observations from a coxph() survival model. Currently, my code looks like this:

ex_model <- lung%>% #lung dataset from survival package
    nest(-sex)%>%
    mutate(fit = map(data, ~
                                        coxph(Surv(time, status) ~ age +
                                                        wt.loss +
                                                        meal.cal,
                                                    data = .)),
                glance = map(fit, glance),
                tidy = map(fit, tidy)))%>%
    glimpse()

# output

Columns: 5
# $ sex    <dbl> 1, 2
# $ data   <list> [<tbl_df[138 x 9]>], [<tbl_df[90 x 9]>]
# $ fit    <list> [0.0205745842, 0.0047088643, -0.0001776546, 1.962976e-04, -3.220915e-06, 1.011575e-06, -3.220915e-06, 5.999414e-05, -1.525540e-07, 1.011575e-06, -1.~
# $ glance <list> [<tbl_df[1 x 18]>], [<tbl_df[1 x 18]>]
# $ tidy   <list> [<tbl_df[3 x 5]>], [<tbl_df[3 x 5]>]

which gives me a dataframe with a column for the nest variable (sex), and four list columns data, fit, glance, and tidy. I would like to add a column augment containing the fittedd values for each observation but have been unsucessful mapping the augment function to fit.

Here is an example of code which generates my desired output using lm() instead of coxph()

ex_model <- lung%>% #lung dataset from survival package
    nest(-sex)%>%
    mutate(fit = map(data, ~
                                        lm(status ~ age +
                                                        wt.loss +
                                                        meal.cal,
                                                    data = .)),
                glance = map(fit, glance),
                tidy = map(fit, tidy),
                augment = map(fit, augment))%>%
    glimpse()

# output
# $ sex     <dbl> 1, 2
# $ data    <list> [<tbl_df[138 x 9]>], [<tbl_df[90 x 9]>]
# $ fit     <list> [1.415301e+00, 7.049341e-03, 6.981800e-04, -7.171368e-05, 0.18272053, 0.25767747, -0.90016296, 0.25907699, 0.15038564, 0.23754238, 0.22724620, 0.32~
# $ glance  <list> [<tbl_df[1 x 12]>], [<tbl_df[1 x 12]>]
# $ tidy    <list> [<tbl_df[4 x 5]>], [<tbl_df[4 x 5]>]
# $ augment <list> [<tbl_df[106 x 11]>], [<tbl_df[65 x 11]>]

When I use the mutate(augment = map(fit, augment))%>% syntax with coxph(), RStudio returns an error:

Did you want `data = c(inst, time, status, age, ph.ecog, ph.karno, pat.karno, meal.cal, 
    wt.loss)`?Error: Problem with `mutate()` input `augment`.
x Must specify either `data` or `newdata` argument.
i Input `augment` is `map(fit, augment)`.

Is this a problem with my syntax, or is there a more fundamental reason I can't augment(fit) here? What is the most efficient way around this issue?


Solution

  • The error you want to focus on is

    x Must specify either `data` or `newdata` argument.
    

    This error comes from augment() and it says that to use augment you have to pass in data in addition to the model. So you need to do augment(model_fit, newdata = my_new_data). For more information on augment.coxph() look here.

    library(tidyverse)
    library(survival)
    library(broom)
    
    ex_models <- lung %>%
        nest(data = c(inst, time, status, age, ph.ecog, ph.karno, pat.karno, meal.cal, wt.loss)) %>%
        mutate(fit = map(data, ~ coxph(Surv(time, status) ~ age + wt.loss + meal.cal, data = .)),
               glance = map(fit, glance),
               tidy = map(fit, tidy),
               augment = map(fit, augment, newdata = lung))
    
    glimpse(ex_models)
    #> Rows: 2
    #> Columns: 6
    #> $ sex     <dbl> 1, 2
    #> $ data    <list> [<tbl_df[138 x 9]>], [<tbl_df[90 x 9]>]
    #> $ fit     <list> [0.0205745842, 0.0047088643, -0.0001776546, 1.962976e-04, -3.2…
    #> $ glance  <list> [<tbl_df[1 x 18]>], [<tbl_df[1 x 18]>]
    #> $ tidy    <list> [<tbl_df[3 x 5]>], [<tbl_df[3 x 5]>]
    #> $ augment <list> [<tbl_df[228 x 12]>], [<tbl_df[228 x 12]>]
    

    Created on 2021-08-14 by the reprex package (v2.0.1)