I have run a glm()
model; but now I would like to measure the model's accuracy with PPV, NPV, sensitivity and specificity. However, I keep getting confusing results.
My outcome is a factor variable that looks like this:
table(mydata$outcome)
0 1
6824 359
The predictors are a combination of continuous variables with 1 categorical (gender).
My code is:
# To run the logistic model
mod <- glm(outcome~predictor1+predictor2+predictor3,data=mydata,family=binomial("logit"))
summary(mod)
# To run predict() to get the predicted values of the outcome
predicted = predict(object = mod, newdata=mydata, type = "response")
The results for this look like this:
head(predicted)
1 2 3 4 5 6
0.02568802 0.02979873 0.01920584 0.01077031 0.01279325 0.09725329
This is very surprising as I was expected to observe predicted '1' (cases) vs '0' (controls) which I could further use to obtain the accuracy measures of the models either with confusionMatrix(predicted, mydata$outcome)
or using ModelMetrics
library.
So my question is how can I get 4x4 table (predicted vs observed) outcome which I can use to measure the accuracy of my glm()
model in predicting the outcome? I will be grateful for any advice, or please let me know if there are better ways of getting the PPV, NPV, sensitivity and specificity. Thank you.
Your glm model is giving probabilities of the two outcomes. Typically, one wants to assign '1' to any event with probability >= . 5 and 0 otherwise. You can do this with round(). In more 'machine-learny' type situations, one might consider different values besides .5. You can use the ifelse() fn to do that. For example, if you want to assign '1' only to cases with probability .7 you could say vals = ifelse(mydata$outcome >.7,1,0 ). Finally, the data you want is usually called a confusion matrix. It can be computed via various packages, but here is a nice solution from a sister site - R: how to make a confusion matrix for a predictive model?