Search code examples
rstatasurvival-analysiscox

How to reproduce Cox survival model results from Stata in R?


I am attempting to reproduce in R Cox survival model results originally obtained in Stata. Here is the Stata code:

stset t, id(leadid) failure(c_coup)

stcox legislature  lgdp_1 growth_1 exportersoffuelsmainlyoil_EL2008 ethfrac_FIXED communist mil cw age

Here is the code that I wrote do reproduce this in R:

# Load survival package
library(survival)

# Set the survival object
surv_obj <- Surv(data$t, data$c_coup)

# Run model
m1 <- coxph(surv_obj ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age, data = data, method = "breslow")

# Examine hazard ratios
exp(coef(m1))

For whatever reason, I am unable to obtain the same results. For example, the estimate for legislature in the Stata results (the original results I want to produce) is 0.298. In R, it is 0.1688371. Any advice would be appreciated.

Here is dataset preview for reproduction:

structure(list(t = structure(c(1, 2, 3, 4, 5, 6), label = "Current time in office", format.stata = "%9.0g"), 
    c_coup = structure(c(0, 0, 0, 0, 0, 0), format.stata = "%9.0g"), 
    leadid = structure(c("A2.2-208", "A2.2-208", "A2.2-208", 
    "A2.2-208", "A2.2-208", "A2.2-208"), label = "Leader ID", format.stata = "%13s"), 
    legislature = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"), 
    lgdp_1 = structure(c(7.68524360656738, 7.69938945770264, 
    7.54960918426514, 7.57916784286499, 7.6033992767334, 7.67089462280273
    ), format.stata = "%9.0g"), growth_1 = structure(c(6.35386085510254, 
    1.42463231086731, -13.910285949707, 3, 2.45273375511169, 
    6.98254346847534), label = "annual growth, t-1, Maddison", format.stata = "%9.0g"), 
    exportersoffuelsmainlyoil_EL2008 = structure(c(0, 0, 0, 0, 
    0, 0), format.stata = "%8.0g"), ethfrac_FIXED = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), label = "eth. frac", format.stata = "%8.0g"), 
    communist = structure(c(0, 0, 0, 0, 0, 0), label = "Communist Leader", format.stata = "%8.0g"), 
    mil = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"), 
    cw = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"), 
    age = structure(c(52, 53, 54, 55, 56, 57), label = "Current age", format.stata = "%9.0g")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • You can try setting a t0 for each row:

    library(dplyr)
    library(foreign)
    
    data = read_stata("leaders.dta")
    data = mutate(data,t0 = lag(t,default=0), .by=leadid)
    
    survobj = Surv(data[["t0"]], data[["_t"]], data$c_coup)
    coxph(survobj~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
          data=data, ties="breslow")
    

    Output:

    Call:
    coxph(formula = survobj ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + 
        ethfrac_FIXED + communist + mil + cw + age, data = data, 
        ties = "breslow")
    
                                          coef exp(coef)  se(coef)      z        p
    legislature                      -1.212411  0.297479  0.226754 -5.347 8.95e-08
    lgdp_1                           -0.337226  0.713748  0.146391 -2.304  0.02125
    growth_1                          0.009499  1.009544  0.016494  0.576  0.56470
    exportersoffuelsmainlyoil_EL2008  0.470004  1.600001  0.304949  1.541  0.12325
    ethfrac_FIXED                    -0.005648  0.994368  0.003489 -1.619  0.10552
    communist                        -1.800431  0.165228  1.008908 -1.785  0.07434
    mil                               1.157206  3.181034  0.265029  4.366 1.26e-05
    cw                                0.886741  2.427206  0.342806  2.587  0.00969
    age                               0.027398  1.027776  0.010398  2.635  0.00842
    
    Likelihood ratio test=103.8  on 9 df, p=< 2.2e-16
    n= 2903, number of events= 116 
       (2751 observations deleted due to missingness)