Search code examples
rstatasurvival-analysis

Delayed entry survival model: R vs Stata differences


I have some code in Stata that I'm trying to redo in R. I'm working on a delayed entry survival model and I want to limit the follow-up to 5 years. In Stata this is very easy and can be done as follows for example:

stset end, fail(failure) id(ID) origin(start) enter(entry) exit(time 5)
stcox var1

However, I'm having trouble recreating this in R. I've made a toy example limiting follow-up to 1000 days - here is the setup:

library(survival); library(foreign); library(rstpm2)

data(brcancer)
brcancer$start <- 0
# Make delayed entry time
brcancer$entry <- brcancer$rectime / 2
# Write to dta file for Stata
write.dta(brcancer, "brcancer.dta")

Ok so now we've set up an identical dataset for use in both R and Stata. Here is the Stata bit code and model result:

use "brcancer.dta", clear
stset rectime, fail(censrec) origin(start) enter(entry) exit(time 1000)
stcox hormon

enter image description here

And here is the R code and results:

# Limit follow-up to 1000 days
brcancer$limit <- ifelse(brcancer$rectime <1000, brcancer$rectime, 1000)
# Cox model 
mod1 <- coxph(Surv(time=entry, time2= limit, event = censrec) ~ hormon, data=brcancer, ties = "breslow")
summary(mod1)

enter image description here

As you can see the R estimates and Stata estimates differ slightly, and I cannot figure out why. Have I set up the R model incorrectly to match Stata, or is there another reason the results differ?


Solution

  • Since the methods match on an avaialble dataset after recoding the deaths that occur after to termination date, I'm posting the relevant sections of my comment as an answer.

    I also think that you should have changed any of the deaths at time greater than 1000 to be considered censored. (Notice that the numbers of events is quite different in the two sets of results.