I want to find some good predictors (genes). This is my data, log transformed RNA-seq:
TRG CDK6 EGFR KIF2C CDC20
Sample 1 TRG12 11.39 10.62 9.75 10.34
Sample 2 TRG12 10.16 8.63 8.68 9.08
Sample 3 TRG12 9.29 10.24 9.89 10.11
Sample 4 TRG45 11.53 9.22 9.35 9.13
Sample 5 TRG45 8.35 10.62 10.25 10.01
Sample 6 TRG45 11.71 10.43 8.87 9.44
I have calculated confusion matrix for different models like below
1- I tested each of 23 genes individually in this code and each of them gives p-value < 0.05 remained as a good predictor; For example for CDK6 I have done
glm=glm(TRG ~ CDK6, data = df, family = binomial(link = 'logit'))
Finally I obtained five genes and I put them in this model:
final <- glm(TRG ~ CDK6 + CXCL8 + IL6 + ISG15 + PTGS2 , data = df, family = binomial(link = 'logit'))
I want a plot like this for ROC curve of each model but I don't know how to do that
Any help please?
I will give you an answer using the pROC package. Disclaimer: I am the author and maintiner of the package. There are alternative ways to do it.
The plot your are seeing was probably generated by the ggroc
function of pROC. In order to generate such a plot from glm models, you need to 1) use the predict
function to generate the predictions, 2) generate the roc curves and store them in a list, preferably named to get a legend automatically, and 3) call ggroc
.
glm.cdk6 <- glm(TRG ~ CDK6, data = df, family = binomial(link = 'logit'))
final <- glm(TRG ~ CDK6 + CXCL8 + IL6 + ISG15 + PTGS2 , data = df, family = binomial(link = 'logit'))
rocs <- list()
library(pROC)
rocs[["CDK6"]] <- roc(df$TRG, predict(glm.cdk6))
rocs[["final"]] <- roc(df$TRG, predict(final))
ggroc(rocs)