Search code examples
rregressionsurvival-analysismarginal-effectsr-marginaleffects

NA values when using avg_predictions() from marginaleffects with a flexsurv model


I'm unsure why I do not get standard errors or confidence intervals when I fit parametric survival models and compute the predicted mean survival times using the avg_predictions(). I tried using different parametric distributions as well out of curiosity and that didn't change anything.

I run the following code:

library(survival)
library(flexsurv)
library(marginaleffects)

data(lung)

# fit flexible parametric survival model
fmodel1 <- list(
  "Weibull" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "weibull"),
  "Gamma" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "gamma"),
  "Gengamma" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "gengamma"),
  "llogis" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "llogis"),
  "llnorm" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "lognormal"),
  "gompertz" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "gompertz")
  )

# using marginaleffects with flexsurv
avg_predictions(fmodel1$Weibull, variables = "sex")

Below is the result I get:

 sex Estimate Std. Error   z Pr(>|z|)    S 2.5 % 97.5 %
   1      331         NA  NA       NA   NA    NA     NA
   2      490       64.5 7.6   <0.001 44.9   363    616

Columns: sex, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

I'm unsure why this produces NAs for all uncertainty estimates for the reference level of the sex variable.


Solution

  • This has to do with how the Delta Method is used to compute standard errors. This is too complicated to explain in a SO answer, but you can find details in this vignette:

    https://marginaleffects.com/vignettes/uncertainty.html

    The gist of it is that standard errors are obtained by first creating a matrix of numeric derivatives of the predictions with respect to the coefficients. Think of it as a matrix with information on “how much do my predictions change when one coefficient changes by a small amount?”

    In your case, there is only one coefficient estimate in the model and one column in the design matrix:

    library(survival)
    library(flexsurv)
    library(marginaleffects)
    
    data(cancer, package = "survival")
    
    mod = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "weibull")
    
    coef(mod)
    >        shape        scale factor(sex)2 
    >     0.280921     5.884162     0.395578
    
    head(model.matrix(mod))
    >   factor(sex)2
    > 1            0
    > 2            0
    > 3            0
    > 4            0
    > 5            0
    > 6            0
    

    When we perturb the factor(sex)2 coefficient to see how predictions change, we see that predictions do change for units where sex is 2, but do not change when sex is 1. Since there's never any change for those units, there's no standard error.

    By the way, in this simple model it makes no difference, but I strongly encourage you to read the documentation in ?predictions for the difference between variables and by. It’s kind of nuanced but my guess is that most people will want this:

    avg_predictions(mod, by = "sex")
    > 
    >  sex Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
    >    1      331         NA   NA       NA   NA    NA     NA
    >    2      491       62.7 7.83   <0.001 47.6   368    614
    > 
    > Columns: sex, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
    > Type:  response