Search code examples
rcox-regression

Cox regression HR grouping


I would like to perform a Cox regression for the following questions: A group of patients receives a treatment "drug" or not (0 / 1). My time variable "time" tells me, how many days the patient is observed and "status" if the patient survived or died (died = 1, survived = 0).

library(survival)

set.seed(123)

df <- data.frame(time = round(runif(100, min = 1, max = 70)), 
                 status = round(runif(100, min = 0, max = 1)),
                 drug = round(runif(100, min = 0, max = 1)),
                 age40 = round(runif(100, min = 0, max = 1)), 
                 stringsAsFactors = FALSE)

object <- Surv(df$time, df$status)
model <- coxph(object ~ drug, data = df)
summary(model)

This works fine for me and tells me, that the HR is 0.89, so the drug prevents patients from dying.

Now I want to do some subgroup analysis, f.e. how does the HR change, if the patient is <= 40 years or > 40 years old (age40: 0 vs 1).

Is all I have to do to include the variable "age40" into the coxph?

object2 <- Surv(df$time, df$status)
model2 <- coxph(object2 ~ drug + age40, data = df)
summary(model2)

If I do that my HR in the summary for drug1 slightly changes to 0.86 and I get another one for age40 (1.12).

Now my question is: How are the Hazard Ratios for dying under treatment (drug = 1) if the patient is <= 40 or > 40 years old.

EDIT: Another question would be to graphically show the different HRs of the effect of drug on status in a forest plot, f.e. like this: https://rpkgs.datanovia.com/survminer/reference/ggforest-2.png. Instead of "sex", "rx", "adhere" etc. I would like to show the HRs for Age40 = 0 vs. 1 and other variables as well, like hypertension = 0 vs. 1, smoker = 0 vs. 1.

Thank you!


Solution

  • The function you need to use is predict on your model2, and it needs to be supplied with a newdata argument that includes all the cases that you want to consider:

    exp( predict(model2, newdata=expand.grid(drug=c(0,1), age40=c(0,1))) )
    #        1         2         3         4 
    #1.0000000 0.8564951 1.1268713 0.9651598 
    

    You now have all 4 cases of possible combinations of drug and age40. The base case has a value of unity because you are estimating risk ratios form a baseline case of {drug=0, age40=0} You can see what the other risk ratios are associated with

    expand.grid(drug=c(0,1), age40=c(0,1))
      drug age40
    1    0     0
    2    1     0
    3    0     1
    4    1     1
    

    Notice that the ratio of drug=0 to drug=1 is the same for each age category considered separately. If you had wanted to see if the effects of drug was different in the two age categories you would have used an interaction model:

    model3 <- coxph(object2 ~ drug * age40, data = df)
    summary(model3)
    #----------------
    Call:
    coxph(formula = object2 ~ drug * age40, data = df)
    
      n= 100, number of events= 50 
    
                   coef exp(coef) se(coef)      z Pr(>|z|)
    drug       -0.18524   0.83091  0.45415 -0.408    0.683
    age40       0.09611   1.10089  0.39560  0.243    0.808
    drug:age40  0.05679   1.05843  0.63094  0.090    0.928
    
               exp(coef) exp(-coef) lower .95 upper .95
    drug          0.8309     1.2035    0.3412     2.024
    age40         1.1009     0.9084    0.5070     2.390
    drug:age40    1.0584     0.9448    0.3073     3.645
    
    Concordance= 0.528  (se = 0.042 )
    Likelihood ratio test= 0.34  on 3 df,   p=1
    Wald test            = 0.33  on 3 df,   p=1
    Score (logrank) test = 0.33  on 3 df,   p=1
    

    And the effect estimates are now a bit different:

     exp( predict(model3, newdata=expand.grid(drug=c(0,1), age40=c(0,1))) )
    #        1         2         3         4 
    #1.0000000 0.8309089 1.1008850 0.9681861