Search code examples
ranova

Why R plots Residuals vs Leverage instead of Residuals vs Factor Levels (ANOVA test & model with 'aov')


I'm analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable 'smoke' - whether a mother is a smoker, or not.

I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a 'Residuals vs Factor Levels' plot. Instead, I got a 'Residuals vs Leverage' plot, as if my categorical variable was a numeric one.

You can find the dataset here: https://drive.google.com/file/d/1VwiAHdYZF2BrGZZ875GGdkyamKMgxmGU/view?usp=sharing

In there variable 'smoke' has values 0 (non-smoker) and 1 (smoker). I used mutate to change it into a proper factor (among others, like parity), then made the aov test itself and tried to plot the results, to verify the assumptions. Below you can find my code:

babies <- read.csv("babies.csv")
babies <- babies %>% 
mutate(parity = factor(parity, 
                     levels = c(0, 1), 
                     labels = c("not firstborn", "firstborn"))) %>% 
mutate(smoke = factor(smoke, 
                    levels = c(0, 1), 
                    labels = c("non smoker", "smoker")))

model6 <- aov(babies$bwt ~ babies$smoke)
par(mfrow = c(2,2))
plot(aov(babies$bwt ~ babies$smoke))

The result I'm getting in the fourth plot is this:

enter image description here

I tried to check whether 'smoke' is a factor as I wanted or not, like that:

> head(babies$smoke)
[1] non smoker non smoker smoker     non smoker smoker     non smoker
Levels: non smoker smoker

Since 'smoke' is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?

Thanks for the help in advance!


Solution

  • There's no plot.aov method, so you got the plot.lm method (since the class of aov(babies$bwt ~ babies$smoke) would be c("aov", "lm")).

    If you want to plot something else for "aov" objects, write a function plot.aov to do it. For example,

    plot.aov <- function(x, which = 1:3, ...) {
      NextMethod(x, which = which, ...)
      for (f in names(x$xlevels)) {
        xvar <- x$model[, f]
        plot(xvar, residuals(x), xlab = f, ylab = "Residuals")
      }
    }
    

    Then

    plot(aov(babies$bwt ~ babies$smoke))
    

    will plot the first 3 plot.lm plots, followed by residual plots against the factor in your model. (It will do each of the factors if there are others.) I get

    screenshot

    Alternatively, you can use the termplot function, e.g.

    termplot(aov(babies$bwt ~ babies$smoke), partial.resid=TRUE)
    

    which gives this plot:

    enter image description here