Search code examples
rplotstatisticsregressionlinear-regression

Is it possible to use the identify function in R for automatically generated Q-Q plots?


I am taking an introductory course in linear regression in college this semester. For one of my assignments, I am required to analyse a dataset using R.

Allow me to first share part of my code:

log_Metab <- log(Metab)
mammal.lm.1 <- lm(Life ~ log_Metab)
plot(mammal.lm.1, which = 2)

Basically, my dataset contains information about the metabolism rate (Metab) and lifespan (Life) of 95 different mammals and I need to check whether there is a linear relationship between the two characteristics.

Now, the third line of the code that I pasted generates the normal Q-Q plot of the linear regression, as shown below:

Normal Q-Q Plot

What I would like to know is simple and is stated in the title of the post - is it possible to use the identify function for such a plot like this Q-Q plot? The three numbered observations in the plot are automatically selected by R and not by me. If it is possible, please show and explain the code(s) I should type. For example, how can I identify the point immediately to the left of the 90th observation if I wish to?

P.S. I apologise in advance if this is something trivial, but I have only been using R for about a month and this is already beyond the scope of what I have learnt :)


Solution

  • It is possible to do what you want by computing the coordinates separately from the plot. First we need reproducible data since you did not provide any. The data set mtcars comes with R (as do many other data sets):

    data(mtcars)
    log_hp <- log(mtcars$hp)
    mpg.lm <- lm(mpg~log_hp, mtcars)
    

    We have computed a linear regression for mpg (miles per gallon) from the log of hp (horsepower). The command plot(mpg.lm) will call a special version of the plot command, plot.lm, and prepare 4 plots. By reading the manual page at ?plot.lm we can see that the plot you want is the 2nd and we can access that plot with the following:

    plot(mpg.lm, which=2)
    

    Now we need the standardized residuals and the theoretical quantiles:

    mpg.res <- rstandard(mpg.lm)
    out <- qqnorm(mpg.res, plot.it=FALSE)
    coords <- cbind(x=out$x, y=out$y)
    

    The matrix coords has the quantiles and the standardized residuals and the row names are the cars. That gives us everything we want to identify points on the plot. I'll make the identified points red:

    identify(coords, labels=rownames(coords), cex=.75, col="red")
    

    QQ Plot