Search code examples
rcategorical-datacox-regression

How to code categorical variable with three levels for Cox with random effects model?


I want to estimate regression parameters of a Cox random effects model. Let us say that I have a categorical variable with two levels, sex for example. Then coding the variable is straightforward: 0 if male and 1 if female for example. The interpretation of the regression coefficient associated to that variable is simple.

Now let us say that I have a categorical variable with three levels. If I just code the variable with 0,1,2 for the three levels (A,B and C), the estimation of the associated regression coefficient would not be what I am looking for. If I want to estimate the risks associated with each "level" wrt the other levels, how should I code the variable ?

What I have done so far:

I define three variables. I define one variable where I code level A as 1 and the rest (levels B and C) as 0. I define another variable where I code level B as 1 and the rest (levels A and C) as 0. Finally, I define a variable where I code level C as 1 and the rest (levels A and B) as 0.

I then estimate the three regression parameters assocaited to the variables.

Just to be clear, I do not want to use any package such as coxph, coxme, survival, etc.

Is there an easier way to to this ?


Solution

  • Your description (one predictor variable that is all ones, the other two predictor variables as indicator variables for groups B and C) is exactly recapitulating the standard treatment contrasts that R uses.

    If you want to construct a model matrix with treatment contrasts for a single factor f (inside a data frame d), then model.matrix(~f, data=d) will work

    d <- data.frame(f=factor(c("A","B","B","C","A")))
    model.matrix(~f, data=d)
    

    Results:

      (Intercept) fB fC
    1           1  0  0
    2           1  1  0
    3           1  1  0
    4           1  0  1
    5           1  0  0
    attr(,"assign")
    [1] 0 1 1
    attr(,"contrasts")
    attr(,"contrasts")$f
    [1] "contr.treatment"
    

    You can use other contrasts if you like; these will change the parameter estimates (and interpretation!) for your individual variables, but not the overall model fit. e.g.

    model.matrix(~f , data=d, contrasts=list(f=contr.sum))