Search code examples
rmatrixbinning

2D irregular aggregation of a matrix


I'm trying to bin a symmetric matrix with irregular intervals in R but am not sure how to proceed. My ideas are:

  • Reshape the matrix to long format, aggregate and cast it back?
  • Bin as-is in both dimensions (somehow... tapply, aggregate?)
  • Keep the regular binning but for each of my (larger) irregular bins, replace all inner values with their sum?

Here's an example of what I'm trying to do:

set.seed(42)

# symmetric matrix
a <- matrix(rpois(1e4, 2), 100)
a[upper.tri(a)] <- t(a)[upper.tri(a)]

image(x=1:100, y=1:100, a, asp=1, frame=F, axes=F)

# vector of irregular breaks for binning
breaks <- c(12, 14, 25, 60, 71, 89)

# white line show the desired bins
abline(h=breaks-.5, lwd=2, col="white")
abline(v=breaks-.5, lwd=2, col="white")

symmMat

(The aim being that each rectangle drawn above be filled according to the sum of values within it.) I'd appreciate any pointers of how best to approach this.


Solution

  • This answer provides a great starting point using tapply:

    b <- melt(a)
    
    bb <- with(b, tapply(value, 
        list(
          y=cut(Var1, breaks=c(0, breaks, Inf), include.lowest=T), 
          x=cut(Var2, breaks=c(0, breaks, Inf), include.lowest=T)
        ),
        sum)
    )
    
    bb
    #          x
    # y          [0,12] (12,14] (14,25] (25,60] (60,71] (71,89] (89,Inf]
    #  [0,12]      297      48     260     825     242     416      246
    #  (12,14]      48       3      43     141      46      59       33
    #  (14,25]     260      43     261     794     250     369      240
    #  (25,60]     825     141     794    2545     730    1303      778
    #  (60,71]     242      46     250     730     193     394      225
    #  (71,89]     416      59     369    1303     394     597      369
    #  (89,Inf]    246      33     240     778     225     369      230
    

    These can then be plotted as rectangular bins using a base plot and rect — i.e.:

    library("reshape2")
    library("magrittr")
    
    bsq <- melt(bb)
    
    # convert range notation to numerics
    getNum <- . %>%
      # rm brackets
      gsub("\\[|\\(|\\]|\\)", "", .) %>%
      # split digits and convert
      strsplit(",") %>%
      unlist %>% as.numeric
    
    y <- t(sapply(bsq[,1], getNum))
    x <- t(sapply(bsq[,2], getNum))
    
    # normalise bin intensity by area
    bsq$size <- (y[,2] - y[,1]) * (x[,2] - x[,1])
    bsq$norm <- bsq$value / bsq$size
    
    # draw rectangles on top of empty plot
    plot(1:100, 1:100, type="n", frame=F, axes=F)
    rect(ybottom=y[,1], ytop=y[,2],
         xleft=x[,1], xright=x[,2], 
         col=rgb(colorRamp(c("white", "steelblue4"))(bsq$norm / max(bsq$norm)), 
                 alpha=255*(bsq$norm / max(bsq$norm)), max=255),
         border="white")
    

    enter image description here