My datasets are pretty large and rendering generated QQ plots is slow and sometimes even freezes my browser. I know that one option that I have is simply to downsample the data vector. However, I wanted to try hex binning technique instead of downsampling. Unfortunately, I couldn't make it work (two of my several attempts are shown below). If downsampling is possible to achieve using hex binning (which I suspect is, as it's similar to histograms), I'd appreciate, if someone could show me how to do it. I use ggplot2
. Thanks!
g <- ggplot(df, aes(x=var)) + stat_qq(aes(x = var), geom = "hex")
g <- ggplot(df, aes(x = var, y = ..density..)) +
geom_hex(aes(sample = var), stat = "qq")
print (g)
The first call results in the following error message:
Error: stat_qq requires the following missing aesthetics: sample
The second call results in this message:
Error in eval(expr, envir, enclos) : object 'density' not found
UPDATE: I think that more correct variant is this, but I'm not sure what should be the arguments:
g <- ggplot(df, aes(??, ??)) + stat_binhex()
Not sure if this is what you are looking for exactly, but I offer a couple ways to do hexagonal binning. First with ggplot as you are trying to work with and the second with the package hexbin which seems to look better to me, but just my preference.
library(ggplot2)
x <- rgamma(1000,8,2)
y <- rnorm(1000,4,1.5)
binFrame <- data.frame(x,y)
qplot(x,y,data=binFrame, geom='bin2d') # with ggplot...rectangular binning actually
library(hexbin)
hexbinplot(y~x, data=binFrame) # with hexbin...actually hexagonal binning
Edit:
So I was thinking a bit about this at lunch and I think the fundamental issues is that hexbining is a multidimensional data reduction technique and it seems like you are trying to do uni-variate QQ plots on really large sample, but with hexbin in ggplot. At any-rate I can think of a way to do hex bin plots with ggplot, but the best I came up with is to start from scratch and manually construct both the theoretical quantiles (x) and sample quantiles (y). So here is what I came up with.
Basic QQ-Plot Manually
# setting up manual QQ plot used to plot with and with out hexbins
xSamp <- rgamma(1000,8,.5) # sample data
len <- 1000
i <- seq(1,len,by=1)
probSeq <- (i-.5)/len # probability grid
invCDF <- qnorm(probSeq,0,1) # theoretical quantiles for standard normal, but you could compare your sample to any distribution
orderGam <- xSamp[order(xSamp)] # ordered sampe
df <- data.frame(invCDF,orderGam)
plot(invCDF,orderGam,xlab="Standard Normal Theoretical Quantiles",ylab="Standardized Data Quantiles",main="QQ-Plot")
abline(lm(orderGam~invCDF),col="red",lwd=2)
QQ Plot With Hexbins in ggplot:
ggplot(df, aes(invCDF, orderGam)) + stat_binhex() + geom_smooth(method="lm")
![QQ Plot with ggplot][2]
So at the end of the day this might not scale up readily, but if you are looking to do true multidimensional tests of normality you might think about chi-square plots for multivariate normality. cheers