I am reproducing in R
some survival analysis results published in a journal. The original results were produced in Stata. Here are the original results: . Here is the code I use to reproduce these results in R
.
# load packages
library(dplyr)
library(foreign)
library(stargazer)
# load Svolik's original data
data = read_stata("leaders, institutions, covariates, updated tvc.dta")
# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)
# coup survival object original
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)
# coup model original
coups_original <- coxph(survobj_coup~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=data, ties="breslow")
# revolt survival object original
survobj_revolt = Surv(data[["t0"]], data[["_t"]], data$c_revolt)
# revolt model original
revolt_original <- coxph(survobj_revolt~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ mil+ cw+ age,
data=data, ties="breslow")
# natural survival object original
survobj_natural = Surv(data[["t0"]], data[["_t"]], data$c_natural)
# natural model original
natural_original <- coxph(survobj_natural~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=data, ties="breslow")
# Define a function to exponentiate coefficients
exp_coef <- function(x) { round(exp(x), 3) }
# Create the table using stargazer
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef)
Here is the output. Look at Models 1, 3, and 5 (disregard the others). Notice that while the coefficients are the same, the stars are all wonky. Does anyone know what's going on? Interestingly, when I present the results without exponentiating the coefficients (i.e., when I don't present the results in terms of hazard ratios), the stars are correct. But I need to present the results in terms of hazard ratios. Any advice would be appreciated!
The stargazer()
function has an argument called p.auto
which defaults to
TRUE
. The documentation states:
a logical value that indicates whether stargazer should calculate the p-values, using the standard normal distribution, if coefficients or standard errors are supplied by the user (from arguments coef and se) or modified by a function (from arguments apply.coef or apply.se).
That is, if you transform the coefficients the default behaviour is to recalculate p-values from the transformed coefficients and untransformed standard errors based on the standard normal distribution. In other words: nonsense.
You’ll need to specify p.auto = FALSE
to keep using the p-values calculated by coxph()
.
Here’s a reproducible example:
library(survival)
library(stargazer) |> suppressPackageStartupMessages()
fit <- coxph(Surv(futime, fustat) ~ age + factor(ecog.ps), data = ovarian)
fit
#> Call:
#> coxph(formula = Surv(futime, fustat) ~ age + factor(ecog.ps),
#> data = ovarian)
#>
#> coef exp(coef) se(coef) z p
#> age 0.16150 1.17527 0.04992 3.235 0.00122
#> factor(ecog.ps)2 0.01866 1.01884 0.59908 0.031 0.97515
#>
#> Likelihood ratio test=14.29 on 2 df, p=0.000787
#> n= 26, number of events= 12
stargazer reported p-value indicators don’t match:
stargazer(fit, apply.coef = exp, type = "text")
#>
#> ================================================
#> Dependent variable:
#> ---------------------------
#> futime
#> ------------------------------------------------
#> age 1.175***
#> (0.050)
#>
#> factor(ecog.ps)2 1.019*
#> (0.599)
#>
#> ------------------------------------------------
#> Observations 26
#> R2 0.423
#> Max. Possible R2 0.932
#> Log Likelihood -27.838
#> Wald Test 10.540*** (df = 2)
#> LR Test 14.295*** (df = 2)
#> Score (Logrank) Test 12.261*** (df = 2)
#> ================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
… unless you specifically tell it not to recalculate them:
stargazer(fit, apply.coef = exp, type = "text", p.auto = FALSE)
#>
#> ================================================
#> Dependent variable:
#> ---------------------------
#> futime
#> ------------------------------------------------
#> age 1.175***
#> (0.050)
#>
#> factor(ecog.ps)2 1.019
#> (0.599)
#>
#> ------------------------------------------------
#> Observations 26
#> R2 0.423
#> Max. Possible R2 0.932
#> Log Likelihood -27.838
#> Wald Test 10.540*** (df = 2)
#> LR Test 14.295*** (df = 2)
#> Score (Logrank) Test 12.261*** (df = 2)
#> ================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01