I have a data frame of mammal genera. Each row of the column is a different genus. There are three columns: a column of each genus's geographic range size (a continuous variable), a column stating whether or not a genus is found inside or outside of river basins (a binary variable), and a column stating whether the genus is found in the fossil record (a binary variable).
I have performed a multiple logistic regression to see if geographic range size and presence in/out of basins is a predictor of presence in the fossil record using the following R code.
Regression<-glm(df[ ,"FossilRecord"] ~ log(df[ ,"Geographic Range"]) + df[ ,"Basin"], family="binomial")
I am trying to find a way to visually summarize the output of this regression (other than a table of the regression summary).
I know how to do this for a single variable regression. For example, I could use a plot like if I wanted to see the relationship between just geographic range size and presence in the fossil record.
However, I do not know how to make a similar or equivalent plot when there are two independent variables, and one of them is binary. What are some plotting and data visualization techniques I could use in this case?
Thanks for the help!
Visualization is important and yet it can be very hard. With your example, I would recommend plotting one line for predicted FossilRecord versus GeographicRange for each level of your categorical covariate (Basin). Here's an example of how to do it with the ggplot2 package
##generating data
ssize <- 100
set.seed(12345)
dat <- data.frame(
Basin = rbinom(ssize, 1,.4),
GeographicRange = rnorm(ssize,10,2)
)
dat$FossilRecord = rbinom(ssize,1,(.3 + .1*dat$Basin + 0.04*dat$GeographicRange))
##fitting model
fit <- glm(FossilRecord ~ Basin + GeographicRange, family=binomial(), data=dat)
We can use the predict()
function to obtain predicted response values for many GeographicRange values and for each Basin category.
##getting predicted response from model
plotting_dfm <- expand.grid(GeographicRange = seq(from=0, to = 20, by=0.1),
Basin = (0:1))
plotting_dfm$preds <- plogis( predict(fit , newdata=plotting_dfm))
Now you can plot the predicted results:
##plotting the predicted response on the two covariates
library(ggplot2)
pl <- ggplot(plotting_dfm, aes(x=GeographicRange, y =preds, color=as.factor(Basin)))
pl +
geom_point( ) +
ggtitle("Predicted FossilRecord by GeoRange and Basin") +
ggplot2::ylab("Predicted FossilRecord")