I'm trying to create a graph representing a logistic regression of binary data (clinical signs) against a continous predictor (log copy number). I can generate the model using glm() no problem but I am having an issue using the lines() function to actually plot the representation of the regression. Here is what my data looks like.
df.min <- structure(list(clinical.signs = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
log.copy.num = c(0, 5.43372200355424, 0, 0, 0, 0, 0, 4.18965474202643,
3.42751468997953, 0, 0, 0, 0, 0, 0.824175442966349, 0, 0,
0, 0, 0, 2.97552956623647, 1.91692261218206, 1.43270073393405,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.13179677201376, 0,
0, 0, 3.53805656437935, 0, 0, 0, 0, 0, 0, 0, 4.26127043353808,
2.54160199346455, 1.15057202759882, 4.88280192258637, 0,
0, 0, 0, 0, 3.62434093297637, 0, 0, 0, 0, 0, 0, 3.45946628978613,
0, 0, 0, 7.40913644392013, 0, 0, 0, 0, 0, 0, 0, 3.35689712276558,
0, 0, 0, 0, 4.25518708733893, 0, 0, 0, 0, 0, 0, 0, 0, 0,
3.15700042115011, 0, 2.07317192866624, 0, 7.85979918056211,
3.16124671203156, 0, 2.20386912005489, 5.04985600724954,
0, 1.45395300959371, 0, 3.28091121578765, 3.83945231259331,
2.54160199346455, 2.66722820658195, 2.2512917986065, 7.53955882930103,
6.30261897574491, 6.96696713861398)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -110L)
)
and my script
#logistic regression using glm
logimodel <- glm(clinical.signs ~ log.copy.num, data = df.min, family = "binomial")
summary(logimodel)
#plot the logisitc regression above
xaxis <- seq(min(df.min$log.copy.num), max(df.min$log.copy.num), 0.1)
yaxis <- predict(logimodel, list(log.copy.num=xaxis), type = "response")
plot(xaxis, yaxis)
plot(df.min$log.copy.num, df.min$clinical.signs)
lines(xaxis,yaxis, col = "blue")
Thank you for any guidance on what I'm sure is a foolish oversight!
You have clinical signs as factor:
class(df.min$clinical.signs)
[1] "factor"
Hence when you plot it, they are converted to 1s and 2s, while your yaxis are in 0-1 range (because you have probabilities of being "1"). To have it on the same scale, do
plot(df.min$log.copy.num, as.numeric(df.min$clinical.signs)-1,
ylab="clinical signs",xlab="log.copy.num")
lines(xaxis,yaxis, col = "blue")