I'm analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable 'smoke' - whether a mother is a smoker, or not.
I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a 'Residuals vs Factor Levels' plot. Instead, I got a 'Residuals vs Leverage' plot, as if my categorical variable was a numeric one.
You can find the dataset here: https://drive.google.com/file/d/1VwiAHdYZF2BrGZZ875GGdkyamKMgxmGU/view?usp=sharing
In there variable 'smoke' has values 0 (non-smoker) and 1 (smoker). I used mutate to change it into a proper factor (among others, like parity), then made the aov test itself and tried to plot the results, to verify the assumptions. Below you can find my code:
babies <- read.csv("babies.csv")
babies <- babies %>%
mutate(parity = factor(parity,
levels = c(0, 1),
labels = c("not firstborn", "firstborn"))) %>%
mutate(smoke = factor(smoke,
levels = c(0, 1),
labels = c("non smoker", "smoker")))
model6 <- aov(babies$bwt ~ babies$smoke)
par(mfrow = c(2,2))
plot(aov(babies$bwt ~ babies$smoke))
The result I'm getting in the fourth plot is this:
I tried to check whether 'smoke' is a factor as I wanted or not, like that:
> head(babies$smoke)
[1] non smoker non smoker smoker non smoker smoker non smoker
Levels: non smoker smoker
Since 'smoke' is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?
Thanks for the help in advance!
There's no plot.aov
method, so you got the plot.lm
method (since the class of aov(babies$bwt ~ babies$smoke)
would be c("aov", "lm")
).
If you want to plot something else for "aov"
objects, write a function plot.aov
to do it. For example,
plot.aov <- function(x, which = 1:3, ...) {
NextMethod(x, which = which, ...)
for (f in names(x$xlevels)) {
xvar <- x$model[, f]
plot(xvar, residuals(x), xlab = f, ylab = "Residuals")
}
}
Then
plot(aov(babies$bwt ~ babies$smoke))
will plot the first 3 plot.lm
plots, followed by residual plots against the factor in your model. (It will do each of the factors if there are others.) I get
Alternatively, you can use the termplot
function, e.g.
termplot(aov(babies$bwt ~ babies$smoke), partial.resid=TRUE)
which gives this plot: