Search code examples
ranovainteractioncox-regression

How to find overall significance for main effects in a dummy interaction using anova()


I run a Cox Regression with two categorical variables (x1 and x2) and their interaction. I need to know the significance of the overall effect of x1, x2 and of the interaction.

The overall effect of the interaction:

I know how do find out the overall effect of the interaction using anova():

library(survival)
fit_x1_x2 <- coxph(Surv(time, death) ~ x1 + x2        , data= df)
fit_full <-  coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_x2, fit_full)

But how are we supposed to use anova() to find out the overall effect of x1 or x2? What I tried is this:

The overall effect of x1

fit_x2_ia <- coxph(Surv(time, death) ~      x2 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death)  ~ x1 + x2 + x1:x2, data= df)
anova(fit_x2_ia, fit_full)

The overall effect of x2

fit_x1_ia <- coxph(Surv(time, death) ~ x1 +      x1:x2, data= df)
fit_full <- coxph(Surv(time, death)  ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_ia, fit_full)

I am not sure whether this is how we are supposed to use anova(). The fact that the output shows degree of freedom is zero makes me sceptical. I am even more puzzled that both times, for the overall effect of x1 and x2, the test is significant, although the log likelihood values of the models are the same and the Chi value is zero.

Here is the data I used

set.seed(1) # make it reproducible
df <- data.frame(x1= rnorm(1000), x2= rnorm(1000)) # generate data         
df$death <- rbinom(1000,1, 1/(1+exp(-(1 + 2 * df$x1 + 3 * df$x2 + df$x1 * df$x2)))) # dead or not
library(tidyverse) # for cut_number() function
df$x1 <- cut_number(df$x1, 4); df$x2 <- cut_number(df$x2, 4) # make predictors to groups
df$time <- rnorm(1000); df$time[df$time<0] <- -df$time[df$time<0] # add survival times

Solution

  • The answer from 42- is informative but after reading it I still did not know how to determine the three p values or if this is possible at all. Thus I talked to the professor of biostatistic of my university. His answer was quite simple and I share it in case others have similar questions.

    In this design it is not possible to determine the three p values for overall effect of x1, x2 and their interaction. If we want to know the p values of the three overall effects, we need to keep the continuous variables as they are. But breaking up the variables into groups answers a different question, hence we can not test the hypothesis of the overall effects no matter which statisstical model we use.