Search code examples
rstatasurvival-analysisstargazer

Why are the significance levels incorrect when presenting survival analysis results in terms of hazard ratios and exporting to LaTeX using stargazer?


I am reproducing in R some survival analysis results published in a journal. The original results were produced in Stata. Here are the original results: enter image description here. Here is the code I use to reproduce these results in R.

# load packages
library(dplyr)
library(foreign)
library(stargazer)

# load Svolik's original data 
data = read_stata("leaders, institutions, covariates, updated tvc.dta")

# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)

# coup survival object original
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)

# coup model original
coups_original <- coxph(survobj_coup~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
      data=data, ties="breslow")

# revolt survival object original 
survobj_revolt = Surv(data[["t0"]], data[["_t"]], data$c_revolt)

# revolt model original 
revolt_original <- coxph(survobj_revolt~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ mil+ cw+ age, 
                        data=data, ties="breslow")

# natural survival object original
survobj_natural = Surv(data[["t0"]], data[["_t"]], data$c_natural)

# natural model original
natural_original <- coxph(survobj_natural~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
                        data=data, ties="breslow")

# Define a function to exponentiate coefficients
exp_coef <- function(x) { round(exp(x), 3) }

# Create the table using stargazer
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef)

Here is the output. Look at Models 1, 3, and 5 (disregard the others). Notice that while the coefficients are the same, the stars are all wonky. Does anyone know what's going on? Interestingly, when I present the results without exponentiating the coefficients (i.e., when I don't present the results in terms of hazard ratios), the stars are correct. But I need to present the results in terms of hazard ratios. Any advice would be appreciated!

enter image description here


Solution

  • The stargazer() function has an argument called p.auto which defaults to TRUE. The documentation states:

    a logical value that indicates whether stargazer should calculate the p-values, using the standard normal distribution, if coefficients or standard errors are supplied by the user (from arguments coef and se) or modified by a function (from arguments apply.coef or apply.se).

    That is, if you transform the coefficients the default behaviour is to recalculate p-values from the transformed coefficients and untransformed standard errors based on the standard normal distribution. In other words: nonsense.

    You’ll need to specify p.auto = FALSE to keep using the p-values calculated by coxph(). Here’s a reproducible example:

    library(survival)
    library(stargazer) |> suppressPackageStartupMessages()
    
    fit <- coxph(Surv(futime, fustat) ~ age + factor(ecog.ps), data = ovarian)
    fit
    #> Call:
    #> coxph(formula = Surv(futime, fustat) ~ age + factor(ecog.ps), 
    #>     data = ovarian)
    #> 
    #>                     coef exp(coef) se(coef)     z       p
    #> age              0.16150   1.17527  0.04992 3.235 0.00122
    #> factor(ecog.ps)2 0.01866   1.01884  0.59908 0.031 0.97515
    #> 
    #> Likelihood ratio test=14.29  on 2 df, p=0.000787
    #> n= 26, number of events= 12
    

    stargazer reported p-value indicators don’t match:

    stargazer(fit, apply.coef = exp, type = "text")
    #> 
    #> ================================================
    #>                          Dependent variable:    
    #>                      ---------------------------
    #>                                futime           
    #> ------------------------------------------------
    #> age                           1.175***          
    #>                                (0.050)          
    #>                                                 
    #> factor(ecog.ps)2               1.019*           
    #>                                (0.599)          
    #>                                                 
    #> ------------------------------------------------
    #> Observations                     26             
    #> R2                              0.423           
    #> Max. Possible R2                0.932           
    #> Log Likelihood                 -27.838          
    #> Wald Test                10.540*** (df = 2)     
    #> LR Test                  14.295*** (df = 2)     
    #> Score (Logrank) Test     12.261*** (df = 2)     
    #> ================================================
    #> Note:                *p<0.1; **p<0.05; ***p<0.01
    

    … unless you specifically tell it not to recalculate them:

    stargazer(fit, apply.coef = exp, type = "text", p.auto = FALSE)
    #> 
    #> ================================================
    #>                          Dependent variable:    
    #>                      ---------------------------
    #>                                futime           
    #> ------------------------------------------------
    #> age                           1.175***          
    #>                                (0.050)          
    #>                                                 
    #> factor(ecog.ps)2                1.019           
    #>                                (0.599)          
    #>                                                 
    #> ------------------------------------------------
    #> Observations                     26             
    #> R2                              0.423           
    #> Max. Possible R2                0.932           
    #> Log Likelihood                 -27.838          
    #> Wald Test                10.540*** (df = 2)     
    #> LR Test                  14.295*** (df = 2)     
    #> Score (Logrank) Test     12.261*** (df = 2)     
    #> ================================================
    #> Note:                *p<0.1; **p<0.05; ***p<0.01