Search code examples
rplotlinelogistic-regressionglm

Logistic Regression Model not appearing on Plot() - Appears to be lines() issue


I'm trying to create a graph representing a logistic regression of binary data (clinical signs) against a continous predictor (log copy number). I can generate the model using glm() no problem but I am having an issue using the lines() function to actually plot the representation of the regression. Here is what my data looks like.

    df.min <- structure(list(clinical.signs = structure(c(1L, 1L, 1L, 1L, 1L, 
                                                          2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
                                                          1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                                          1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
                                                          2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 
                                                          1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
                                                          1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 
                                                          2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), 
                             log.copy.num = c(0, 5.43372200355424, 0, 0, 0, 0, 0, 4.18965474202643, 
                                              3.42751468997953, 0, 0, 0, 0, 0, 0.824175442966349, 0, 0, 
                                              0, 0, 0, 2.97552956623647, 1.91692261218206, 1.43270073393405, 
                                              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.13179677201376, 0, 
                                              0, 0, 3.53805656437935, 0, 0, 0, 0, 0, 0, 0, 4.26127043353808, 
                                              2.54160199346455, 1.15057202759882, 4.88280192258637, 0, 
                                              0, 0, 0, 0, 3.62434093297637, 0, 0, 0, 0, 0, 0, 3.45946628978613, 
                                              0, 0, 0, 7.40913644392013, 0, 0, 0, 0, 0, 0, 0, 3.35689712276558, 
                                              0, 0, 0, 0, 4.25518708733893, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                              3.15700042115011, 0, 2.07317192866624, 0, 7.85979918056211, 
                                              3.16124671203156, 0, 2.20386912005489, 5.04985600724954, 
                                              0, 1.45395300959371, 0, 3.28091121578765, 3.83945231259331, 
                                              2.54160199346455, 2.66722820658195, 2.2512917986065, 7.53955882930103, 
                                              6.30261897574491, 6.96696713861398)), class = c("tbl_df", 
                                                                                              "tbl", "data.frame"), row.names = c(NA, -110L)

)

and my script

#logistic regression using glm 
logimodel <- glm(clinical.signs ~ log.copy.num, data = df.min, family = "binomial")
summary(logimodel)

#plot the logisitc regression above 
xaxis <- seq(min(df.min$log.copy.num), max(df.min$log.copy.num), 0.1)
yaxis <- predict(logimodel, list(log.copy.num=xaxis), type = "response")
plot(xaxis, yaxis)
plot(df.min$log.copy.num, df.min$clinical.signs)
lines(xaxis,yaxis, col = "blue")

Thank you for any guidance on what I'm sure is a foolish oversight!


Solution

  • You have clinical signs as factor:

    class(df.min$clinical.signs)
    [1] "factor"
    

    Hence when you plot it, they are converted to 1s and 2s, while your yaxis are in 0-1 range (because you have probabilities of being "1"). To have it on the same scale, do

    plot(df.min$log.copy.num, as.numeric(df.min$clinical.signs)-1,
    ylab="clinical signs",xlab="log.copy.num")
    lines(xaxis,yaxis, col = "blue")