Search code examples
predictsurvival-analysiscox-regression

How to get fitted values from clogit model


I am interested in getting the fitted values at set locations from a clogit model. This includes the population level response and the confidence intervals around it. For example, I have data that looks approximately like this:

set.seed(1)
data <- data.frame(Used = rep(c(1,0,0,0),1250),
               Open = round(runif(5000,0,50),0),
               Activity = rep(sample(runif(24,.5,1.75),1250, replace=T), each=4),
               Strata = rep(1:1250,each=4))

Within the Clogit model, activity does not vary within a strata, thus there is no activity main effect.

mod <- clogit(Used ~ Open + I(Open*Activity) + strata(Strata),data=data)

What I want to do is build a newdata frame at which I can eventually plot marginal fitted values at specified locations of Open similar to a newdata design in a traditional glm model: e.g.,

newdata <- data.frame(Open = seq(0,50,1),
                  Activity = rep(max(data$Activity),51))

However, when I try to run a predict function on the clogit, I get the following error:

fit<-predict(mod,newdata=newdata,type = "expected")

Error in Surv(rep(1, 5000L), Used) : object 'Used' not found

I realize this is because clogit in r is being run throught Cox.ph, and thus, the predict function is trying to predict relative risks between pairs of subjects within the same strata (in this case= Used).

My question, however is if there is a way around this. This is easily done in Stata (using the Margins Command), and manually in Excel, however I would like to automate in R since everything else is programmed there. I have also built this manually in R (example code below), however I keep ending up with what appear to be incorrect CIs in my real data, as a result I would like to rely on the predict function if possible. My code for manual prediction is:

coef<-data.frame(coef = summary(mod)$coefficients[,1],
             se= summary(mod)$coefficients[,3])
coef$se <-summary(mod)$coefficients[,4]
coef$UpCI <- coef[,1] + (coef[,2]*2) ### this could be *1.96 but using 2 for simplicity
coef$LowCI <-coef[,1] - (coef[,2]*2) ### this could be *1.96 but using 2 for simplicity

fitted<-data.frame(Open= seq(0,50,2),  
               Activity=rep(max(data$Activity),26))

fitted$Marginal <- exp(coef[1,1]*fitted$Open + 
                     coef[2,1]*fitted$Open*fitted$Activity)/
              (1+exp(coef[1,1]*fitted$Open + 
                     coef[2,1]*fitted$Open*fitted$Activity))

fitted$UpCI <- exp(coef[1,3]*fitted$Open + 
                  coef[2,3]*fitted$Open*fitted$Activity)/
             (1+exp(coef[1,3]*fitted$Open + 
                  coef[2,3]*fitted$Open*fitted$Activity))

fitted$LowCI <- exp(coef[1,4]*fitted$Open + 
                  coef[2,4]*fitted$Open*fitted$Activity)/
            (1+exp(coef[1,4]*fitted$Open + 
                  coef[2,4]*fitted$Open*fitted$Activity))

My end product would ideally look something like this but a product of the predict function....

Example output of fitted values.


Solution

  • Evidently Terry Therneau is less a purist on the matter of predictions from clogit models: http://markmail.org/search/?q=list%3Aorg.r-project.r-help+predict+clogit#query:list%3Aorg.r-project.r-help%20predict%20clogit%20from%3A%22Therneau%2C%20Terry%20M.%2C%20Ph.D.%22+page:1+mid:tsbl3cbnxywkafv6+state:results

    Here's a modification to your code that does generate the 51 predictions. Did need to put in a dummy Strata column.

     newdata <- data.frame(Open = seq(0,50,1),
                       Activity = rep(max(data$Activity),51), Strata=1)
    
     risk <- predict(mod,newdata=newdata,type = "risk")
    
    > risk/(risk+1)
            1         2         3         4         5         6         7 
    0.5194350 0.5190029 0.5185707 0.5181385 0.5177063 0.5172741 0.5168418 
            8         9        10        11        12        13        14 
    0.5164096 0.5159773 0.5155449 0.5151126 0.5146802 0.5142478 0.5138154 
           15        16        17        18        19        20        21 
    0.5133829 0.5129505 0.5125180 0.5120855 0.5116530 0.5112205 0.5107879 
           22        23        24        25        26        27        28 
    0.5103553 0.5099228 0.5094902 0.5090575 0.5086249 0.5081923 0.5077596 
           29        30        31        32        33        34        35 
    0.5073270 0.5068943 0.5064616 0.5060289 0.5055962 0.5051635 0.5047308 
           36        37        38        39        40        41        42 
    0.5042981 0.5038653 0.5034326 0.5029999 0.5025671 0.5021344 0.5017016 
           43        44        45        46        47        48        49 
    0.5012689 0.5008361 0.5004033 0.4999706 0.4995378 0.4991051 0.4986723 
           50        51 
    0.4982396 0.4978068 
    

    {Warning} : It's actually rather difficult for mere mortals to determine which of the R-gods to believe on this one. I've learned so much R and statistics form each of those experts. I suspect there are matters of statistical concern or interpretation that I don't really understand.