Search code examples
rsurvival-analysisglmnetcox-regression

How to create a binary preditor from a multivariate glmnet (coxnet) model?


Let´s use the following example:

generate survival data (1000 samples with 30 variables)

library(glmnet)
library(survival)
set.seed(10101)
N=1000;p=30
nzc=p/3
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
fx=x[,seq(nzc)]%*%beta/3
hx=exp(fx)
ty=rexp(N,hx)
tcens=rbinom(n=N,prob=.3,size=1) 
y=cbind(time=ty,status=1-tcens)  

use glmnet to identify variables that are related to survival

fit=glmnet(x,y,family="cox")
cvfit <- cv.glmnet(x, y, family="cox")
plot(cvfit)
coefficients <- coef(fit, s = cvfit$lambda.min)
active_coefficients <- coefficients[,1] != 0

subset matrix and only keep those parameters (n=17) that were identified as relevant by glmnet

x_selected <- x[,active_coefficients]

generate a cox model with relevant parameters (n=17)

summary(coxph(Surv(y[,1],y[,2])~x_selected))

The question that now appers to me is if & how can I subsume the information from the n=17 parameters to get a single (ideally binary) predictor variable to create a Kaplan-Meier plot that illustrates the prognostic performance of this 17-parameter based signature. I could use a PCA and binarize the principal component (and then use this for the Kaplan-Meier plot) but I´m sure there must be a more elegant way since basically the identical analysis that I´d like to perform has recently been done by others (see http://ascopubs.org/doi/pdf/10.1200/JCO.2012.45.5626 & http://ascopubs.org/doi/suppl/10.1200/jco.2012.45.5626/suppl_file/DS2_JCO.2012.45.5626.pdf -> the authors used glmnet and identified 20-genes to be relevant for predicting survival (so far my code is identical). They then however also show Kaplan-Meier plots where they pooled this "20 gene signature" into a single variable with 3 levels ["low","medium","high"] - look at Figure 1 C & D. I´m not sure how I can reproduce this with my example. Any ideas?

Thank you!


Solution

  • Already found the solution - continue the analysis as follows:

    cox_model <- coxph(Surv(y)~x_selected)
    
    #generate a linear predictor from my cox_model
    linear_predictor <- predict(cox_model, type="lp")
    
    #check the linear predictor
    coxph(Surv(y) ~ linear_predictor)
    
    #stone-beran estimate of survival curve
    df <- cbind.data.frame(y,linear_predictor)
    s <- prodlim(Surv(time,status) ~ linear_predictor, data=df)
    
    #plot survival curve
    xl <- c(0,60)
    plot(s, xlab="Time (months)", ylab="Survival rate",
         col=c("green","blue","red"), automar=TRUE, axes=FALSE, atrisk=FALSE,
         confint=FALSE, legend=TRUE,
         legend.title="Coxnet signature", legend.legend=c("low levels", "medium
    levels","high levels"), legend.x="bottomright", legend.cex=0.8, xlim=xl)
    axis(side=1, at=seq(0,240,12))
    axis(side=2, at=seq(0,1,.2))