Search code examples
rplotdistribution

Fitting a line to qqplot with two datasets in r


I have two datasets and I used the qqplot function in r to compare their distributions. I would like to fit a line through the plot but the qqline function is not appropriate since we have two datasets. Any idea on what I can use please?

x <- rnorm(10000, 12, 3) 
y<- rnorm(10000, 18, 5)
qqplot(x,y)
abline(lm(y~x))

Solution

  • Remember that a qqplot of x versus y is just the same as plot(sort(x), sort(y)) :

    x <- rnorm(10000, 12, 3) 
    y <- rnorm(10000, 18, 5)
    
    qqplot(x, y)
    

    
    plot(sort(x), sort(y))
    

    The problem in your example is that you are trying to add the regression line for y on x, but not the sorted versions of x and y. Effectively, you are plotting the regression line for this:

    plot(x, y)
    

    enter image description here

    Which, not surprisingly, is an almost perfectly flat line with an intercept equal to the mean of y.

    Instead, you can regress the sorted y and the sorted x to get the regression line for the qqplot:

    x <- rnorm(10000, 12, 3) 
    y <- rnorm(10000, 18, 5)
    
    qqplot(x, y)
    abline(lm(sort(y) ~ sort(x)), col = "red", lwd = 2, lty = 2)