Search code examples
rggplot2binning

Manually specifying bins with stat_summary2d


I have a large set of data that consists of coordinates (x,y) and a numeric z value that is similar to density. I'm interested in binning the data, performing summary statistics (median, length, etc.) and plotting the binned values as points with the statistics mapped to ggplot aesthetics.

I've tried using stat_summary2d and extracting the results manually (based on this answer: https://stackoverflow.com/a/22013347/2832911). However, the problem I'm running into is that the bin placements are based on the range of the data, which in my case varies by data set. Thus between two plots the bins are not covering the same area.

My question is how to either manually set bins using stat_summary2d, or at least set them to be consistent regardless of the data.

Here is a basic example which demonstrates the approach and how the bins don't line up:

library(ggplot2)
set.seed(2)
df1 <- data.frame(x=runif(100, -1,1), y=runif(100, -1,1), z=rnorm(100))
df2 <- data.frame(x=runif(100, -1,1), y=runif(100, -1,1), z=rnorm(100))
g1 <- ggplot(df1, aes(x,y))+stat_summary2d(fun=mean, bins=10, aes(z=z))+geom_point()
df1.binned <-
    data.frame(with(ggplot_build(g1)$data[[1]],
                    cbind(x=(xmax+xmin)/2, y=(ymax+ymin)/2, z=value, df=1)))
g2 <- ggplot(df2, aes(x,y))+stat_summary2d(fun=mean, bins=10, aes(z=z))+geom_point()
df2.binned <-
    data.frame(with(ggplot_build(g2)$data[[1]],
                    cbind(x=(xmax+xmin)/2, y=(ymax+ymin)/2, z=value, df=2)))
df.binned <- rbind(df1.binned, df2.binned)
ggplot(df.binned, aes(x,y, size=z, color=factor(df)))+geom_point(alpha=.5)

Which generates this image

In reality I will use stat_summary2d several times to get, for instance, the number of points in the bin, and the median and then use aes(size=bin.length, colour=bin.median).

Any tips on how to accomplish this using my proposed approach, or an alternative approach would be welcome.


Solution

  • You can manually set breaks with stat_summary2d. If you want 10 levels from -1 to 1 you can do

    bb<-seq(-1,1,length.out=10+1)
    breaks<-list(x=bb, y=bb)
    

    And then use the breaks variable when you call your plots

    g1 <- ggplot(df1, aes(x,y))+
        stat_summary2d(fun=mean, breaks=breaks, aes(z=z))+
        geom_point()
    

    It's a shame you can't change the geom of the stat_summary2d to "point" so you could make this in one go, but it doesn't look as though stat_summary2d calculate the proper x and y values for that.