Search code examples
rstatasurvival

How to reproduce Stata competing-risks survival analysis results in R?


I am attempting to reproduce some survival analysis results published in a journal. The original results were produced in Stata. Here is the code:

stset time, id(leadid) failure(c_coup)

streg  legislature  leg_growth_2 gdp_1k chgdpen_fearonlaitin Oil_fearonlaitin postcoldwarlag civiliandictatorshiplag militarydictatorshiplag communist lpopl1_fearonlaitin ethfrac_fearonlaitin relfrac_fearonlaitin  age, distribution(weibull) time

Here is my attempt to translate this to R code:

# load packages
library(survival)
library(survminer)

# load data
data <- read.csv("leader_tvc_2.csv", encoding = "latin1")

# create survival object
surv_object <- Surv(time, c_coup) ~ legislature + leg_growth_2 + gdp_1k + chgdpen_fearonlaitin + Oil_fearonlaitin + postcoldwarlag + civiliandictatorshiplag + militarydictatorshiplag + communist + lpopl1_fearonlaitin + ethfrac_fearonlaitin + relfrac_fearonlaitin + age

# run survreg
fit <- survreg(surv_object, data = data, dist = "weibull")

# examine output 
summary(fit)

I am unable to obtain the same results. For example, the estimate for legislature in the Stata results (the ones I want to produce) is 2.057. In R, however, it is 2.48223. Any advice would be appreciated. Here is a link to the full data.

Here is the R output:

Call:
survreg(formula = surv_object, data = data, dist = "weibull")
                           Value Std. Error     z       p
(Intercept)              3.69422    0.62813  5.88 4.1e-09
legislature              2.48223    0.21580 11.50 < 2e-16
leg_growth_2             4.06103    1.81609  2.24  0.0253
gdp_1k                  -0.02217    0.03678 -0.60  0.5467
chgdpen_fearonlaitin     0.49579    1.11792  0.44  0.6574
Oil_fearonlaitin         0.24194    0.23799  1.02  0.3094
postcoldwarlag           0.55797    0.31626  1.76  0.0777
civiliandictatorshiplag -1.87331    0.32142 -5.83 5.6e-09
militarydictatorshiplag -1.34963    0.29635 -4.55 5.3e-06
communist                0.88426    0.28211  3.13  0.0017
lpopl1_fearonlaitin     -0.00252    0.06636 -0.04  0.9697
ethfrac_fearonlaitin     0.47922    0.28625  1.67  0.0941
relfrac_fearonlaitin     0.67571    0.37836  1.79  0.0741
age                      0.01261    0.00704  1.79  0.0733
Log(scale)              -0.12430    0.06339 -1.96  0.0499

Scale= 0.883 

Weibull distribution
Loglik(model)= -829   Loglik(intercept only)= -978
    Chisq= 297.98 on 13 degrees of freedom, p= 6.4e-56 
Number of Newton-Raphson Iterations: 18 
n=3774 (5911 observations deleted due to missingness)

Here are the original results (column 3) enter image description here


Solution

  • It is all about data setup for survival analysis. There are two issues:

    1. The R survreg treats each observation as one independent sampling unit while your Stata stset time, id(leadid) failure(c_coup) specifies ID (leadid) with multiple observations. You need to set up a start-stop type Surv object.
    2. survreg would throw a error message: "start-stop type Surv objects are not supported". Use flexsurv::flexsurvreg instead:

    To get the same results as Stata in R, try this:

    library(survival)
    library(survminer)
    library(tidyverse)
    library(flexsurv)
    data <- read.csv("leader_tvc_2.csv", encoding = "latin1")
    df <- data %>% 
      mutate(time0 = lag(time, default=0), .by = leadid)
    surv_object <- Surv(time0, time, c_coup) ~ legislature + leg_growth_2 + gdp_1k + chgdpen_fearonlaitin + Oil_fearonlaitin + postcoldwarlag + civiliandictatorshiplag + militarydictatorshiplag + communist + lpopl1_fearonlaitin + ethfrac_fearonlaitin + relfrac_fearonlaitin + age
    fit <- flexsurvreg(surv_object, data=df, dist = "weibull") 
    
    > broom::tidy(fit)
    # A tibble: 15 × 5
       term                    estimate std.error statistic   p.value
       <chr>                      <dbl>     <dbl>     <dbl>     <dbl>
     1 shape                     1.09     0.0697     NA     NA       
     2 scale                    13.1      8.61       NA     NA       
     3 legislature               2.06     0.197      10.4    1.01e-25
     4 leg_growth_2              3.04     1.81        1.68   4.68e- 2
     5 gdp_1k                    0.0145   0.0382      0.380  3.52e- 1
     6 chgdpen_fearonlaitin      0.804    1.12        0.721  2.35e- 1
     7 Oil_fearonlaitin          0.126    0.250       0.505  3.07e- 1
     8 postcoldwarlag            0.517    0.329       1.57   5.84e- 2
     9 civiliandictatorshiplag  -1.28     0.325      -3.95   3.91e- 5
    10 militarydictatorshiplag  -0.882    0.306      -2.88   1.96e- 3
    11 communist                 0.480    0.282       1.71   4.40e- 2
    12 lpopl1_fearonlaitin       0.0857   0.0664      1.29   9.87e- 2
    13 ethfrac_fearonlaitin      0.419    0.298       1.41   7.97e- 2
    14 relfrac_fearonlaitin      0.491    0.387       1.27   1.02e- 1
    15 age                      -0.0156   0.00780    -2.00   2.27e- 2
    >