Search code examples
rggplot2logistic-regression

Plotting predictions from a logistic regression


I am trying to plot a logistic regression in R. I currently have this code...

mylogit<- glm(Breeding.success ~ Dam.Age, family = binomial, data = captive)
summary(mylogit)

predicted.data<- as.data.frame(predict(mylogit, type="response", se=TRUE))
summary(predicted.data)

new.data <- cbind(captive, predicted.data)

graph <- ggplot(captive, aes(x=Dam.Age, y=Breeding.success)) +
geom_point()+
stat_smooth(method="glm", method.args = list(family="binomial"), se=FALSE) +
labs(x="Dam age", y="Breeding success")

I currently have a graph with a straight line, which I would like to be curved and smooth. Also I am struggling with plotting the confidence intervals. Any advice would be great, thanks.

I can give you the actual data used - http://datadryad.org/resource/doi:10.5061/dryad.58ff4.

I am reproducing some of the graphs as a part of a final year project. This code is for the breeding success plotted against the dam age.


Solution

  • The main issue is that the logistic curve you're plotting is approximately linear over the range of data you've got (this is generally true when the predicted probabilities are in the range from 0.3 to 0.7).

    You can get standard errors on the plot by specifying se=TRUE in the geom_smooth() call ...

    In the plot below I (1) used stat_sum() instead of geom_point() to visualize the overlapping points in the data set; (2) used fullrange=TRUE to get predictions over the full range of the plot (rather than just the range actually spanned by the data); (3) used expand_limits() to push the graph out to large age values, to illustrate that the prediction does look nonlinear if you extend it to low or high enough predicted probabilities (to get to high probabilities, you'd need to make age negative ...)

    download.file("http://datadryad.org/bitstream/handle/10255/dryad.141600/All%20females%20breeding%20success.csv?sequence=1",dest="breeding_success.csv")
    
    captive <- read.csv("breeding_success.csv")
    library(ggplot2)
    graph <- ggplot(captive, aes(x=Dam.Age, y=Breeding.success)) +
        stat_sum()+
        stat_smooth(method="glm",
                    method.args = list(family="binomial"), se=TRUE,
                    fullrange=TRUE) +
        labs(x="Dam age", y="Breeding success")+
        expand_limits(x=20)
    

    enter image description here