Search code examples
rdata-visualizationbinning

Hexbin: apply function for every bin


I would like to build the hexbin plot where for every bin is the "ratio between class 1 and class2 points falling into this bin" is plotted (either log or not).

x <- rnorm(10000)
y <- rnorm(10000)
h <- hexbin(x,y)
plot(h)
l <- as.factor(c( rep(1,2000), rep(2,8000) ))

Any suggestions on how to implement this? Is there a way to introduce function to every bin based on bin statistics?


Solution

  • @cryo111's answer has the most important ingredient - IDs = TRUE. After that it's just a matter of figuring out what you want to do with Inf's and how much do you need to scale the ratios by to get integers that will produce a pretty plot.

    library(hexbin)
    library(data.table)
    
    set.seed(1)
    x = rnorm(10000)
    y = rnorm(10000)
    
    h = hexbin(x, y, IDs = TRUE)
    
    # put all the relevant data in a data.table
    dt = data.table(x, y, l = c(1,1,1,2), cID = h@cID)
    
    # group by cID and calculate whatever statistic you like
    # in this case, ratio of 1's to 2's,
    # and then Inf's are set to be equal to the largest ratio
    dt[, list(ratio = sum(l == 1)/sum(l == 2)), keyby = cID][,
         ratio := ifelse(ratio == Inf, max(ratio[is.finite(ratio)]), ratio)][,
         # scale up (I chose a scaling manually to get a prettier graph)
         # and convert to integer and change h
         as.integer(ratio*10)] -> h@count
    
    plot(h)
    

    enter image description here