Search code examples
rdependenciesnamespacesr-packagesurvival

Using package::function inside formula in R gives different results (specifically survival::strata inside coxph)


I believed the following two formula calls should be equivalent.

install.packages("survival")
library(survival)

Model 1

coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian)

Call:
coxph(formula = Surv(futime, fustat) ~ age + strata(rx), data = ovarian)

       coef exp(coef) se(coef)     z       p
age 0.13735   1.14723  0.04741 2.897 0.00376

Likelihood ratio test=12.69  on 1 df, p=0.0003678
n= 26, number of events= 12 

Model 2

coxph(Surv(futime, fustat) ~ age + survival::strata(rx), data=ovarian)

Call:
coxph(formula = Surv(futime, fustat) ~ age + survival::strata(rx), 
    data = ovarian)

                             coef exp(coef) se(coef)      z       p
age                       0.14733   1.15873  0.04615  3.193 0.00141
survival::strata(rx)rx=2 -0.80397   0.44755  0.63205 -1.272 0.20337

Likelihood ratio test=15.89  on 2 df, p=0.0003551
n= 26, number of events= 12 

However they are not. The second model, which references 'strata' specifically using the "::" operator, treats it like a fixed effect variable in the model, rather than fitting the model separately within levels of 'strata'. I do not know if this is:

A) An issue with using 'package::function' type calls within a formula in R, or B) An issue with the survival package and namespaces/dependencies

I want to use the call to Model 2, as I am building a package and don't want to have to attach the entire survival package (which is large) when loading mine.

Any help on this would be much appreciated.


Solution

  • In coxph(), there are these lines:

    special <- c("strata", "tt", "frailty", "ridge", "pspline")
    tform$formula <- if (missing(data)) 
      terms(formula, special)
    else terms(formula, special, data = data)
    

    Therefore, in the first case, coxph correctly detects the strata but not in the second case. We can see this using your example:

    library(survival)
    
    formula1 <- Surv(futime, fustat) ~ age + strata(rx)
    formula2 <- Surv(futime, fustat) ~ age + survival::strata(rx)
    special <- c("strata", "tt", "frailty", "ridge", "pspline")
    data <- ovarian
    
    t1 <- terms(formula1, special, data = data)
    t2 <- terms(formula2, special, data = data)
    
    attr(t1, "specials")$strata
    #> [1] 3
    attr(t2, "specials")$strata
    #> NULL
    

    While I don't know the specifics of coxph(), I would say that this difference later leads to the differences in results.


    Therefore, you should avoid using the prefix survival:: in the formula. If for some reason, you have to use this prefix, you can use the following trick:

    library(survival)
    
    # here
    strata <- survival::strata
    coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian)
    #> Call:
    #> coxph(formula = Surv(futime, fustat) ~ age + strata(rx), data = ovarian)
    #> 
    #>        coef exp(coef) se(coef)     z       p
    #> age 0.13735   1.14723  0.04741 2.897 0.00376
    #> 
    #> Likelihood ratio test=12.69  on 1 df, p=0.0003678
    #> n= 26, number of events= 12