I am attempting to reproduce in R Cox survival model results originally obtained in Stata. Here is the Stata code:
stset t, id(leadid) failure(c_coup)
stcox legislature lgdp_1 growth_1 exportersoffuelsmainlyoil_EL2008 ethfrac_FIXED communist mil cw age
Here is the code that I wrote do reproduce this in R
:
# Load survival package
library(survival)
# Set the survival object
surv_obj <- Surv(data$t, data$c_coup)
# Run model
m1 <- coxph(surv_obj ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age, data = data, method = "breslow")
# Examine hazard ratios
exp(coef(m1))
For whatever reason, I am unable to obtain the same results. For example, the estimate for legislature in the Stata results (the original results I want to produce) is 0.298. In R, it is 0.1688371. Any advice would be appreciated.
Here is dataset preview for reproduction:
structure(list(t = structure(c(1, 2, 3, 4, 5, 6), label = "Current time in office", format.stata = "%9.0g"),
c_coup = structure(c(0, 0, 0, 0, 0, 0), format.stata = "%9.0g"),
leadid = structure(c("A2.2-208", "A2.2-208", "A2.2-208",
"A2.2-208", "A2.2-208", "A2.2-208"), label = "Leader ID", format.stata = "%13s"),
legislature = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"),
lgdp_1 = structure(c(7.68524360656738, 7.69938945770264,
7.54960918426514, 7.57916784286499, 7.6033992767334, 7.67089462280273
), format.stata = "%9.0g"), growth_1 = structure(c(6.35386085510254,
1.42463231086731, -13.910285949707, 3, 2.45273375511169,
6.98254346847534), label = "annual growth, t-1, Maddison", format.stata = "%9.0g"),
exportersoffuelsmainlyoil_EL2008 = structure(c(0, 0, 0, 0,
0, 0), format.stata = "%8.0g"), ethfrac_FIXED = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), label = "eth. frac", format.stata = "%8.0g"),
communist = structure(c(0, 0, 0, 0, 0, 0), label = "Communist Leader", format.stata = "%8.0g"),
mil = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"),
cw = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"),
age = structure(c(52, 53, 54, 55, 56, 57), label = "Current age", format.stata = "%9.0g")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
You can try setting a t0
for each row:
library(dplyr)
library(foreign)
data = read_stata("leaders.dta")
data = mutate(data,t0 = lag(t,default=0), .by=leadid)
survobj = Surv(data[["t0"]], data[["_t"]], data$c_coup)
coxph(survobj~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=data, ties="breslow")
Output:
Call:
coxph(formula = survobj ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 +
ethfrac_FIXED + communist + mil + cw + age, data = data,
ties = "breslow")
coef exp(coef) se(coef) z p
legislature -1.212411 0.297479 0.226754 -5.347 8.95e-08
lgdp_1 -0.337226 0.713748 0.146391 -2.304 0.02125
growth_1 0.009499 1.009544 0.016494 0.576 0.56470
exportersoffuelsmainlyoil_EL2008 0.470004 1.600001 0.304949 1.541 0.12325
ethfrac_FIXED -0.005648 0.994368 0.003489 -1.619 0.10552
communist -1.800431 0.165228 1.008908 -1.785 0.07434
mil 1.157206 3.181034 0.265029 4.366 1.26e-05
cw 0.886741 2.427206 0.342806 2.587 0.00969
age 0.027398 1.027776 0.010398 2.635 0.00842
Likelihood ratio test=103.8 on 9 df, p=< 2.2e-16
n= 2903, number of events= 116
(2751 observations deleted due to missingness)