Search code examples
arraysrgeneralization

R: arrays - reducing length (generalization)


I need to reduce length (generalize) an array in R. For example, I have hi-resolution data like this...

my_array=array(c(sample(0:9,32, replace=TRUE)), dim=c(4,4,2))
> my_array
, , 1

     [,1] [,2] [,3] [,4]
[1,]    2    1    8    2
[2,]    3    5    4    6
[3,]    2    8    9    6
[4,]    1    0    9    9

, , 2

     [,1] [,2] [,3] [,4]
[1,]    3    7    9    7
[2,]    9    4    9    8
[3,]    8    6    7    8
[4,]    7    6    9    9

...and I need to "generalise" it to low-resolution using the mean function like this:

, , 1

     [,1] [,2]
[1,] 2.75 4.00
[2,] 2.75 8.25

, , 2

     [,1] [,2]
[1,] 5.75 8.25
[2,] 6.75 8.25

Simply, 4 values of the original array (positions [1,1];[1,2];[2,1];[2,2]) form 1 value (average) in resulting array in [1,1] position. I have tried using "apply" over the array, but I am not able to cope with "non-standard" margins. Is there any more complex function like apply in R?


Solution

  • Here's a solution very similar to the one adapted by @crwang but generalized into a function:

    reduceMatrix <- function(x, rown, coln, fun = mean, ...) {
      out <- matrix(NA,  nrow=nrow(x)/rown, ncol=ncol(x)/coln)
      for (i in 1:(nrow(x)/rown)) {
        for (j in 1:(ncol(x)/coln)) {
          indi <- c(rown*i-1, rown*i)
          indj <- c(coln*j-1, coln*j)
          out[i, j] <- fun(x[indi, indj], ...)  
        }
      }
      out
    }
    

    The function works on 2d arrays, so you can apply them over the 3rd dimenssion of my_array:

    set.seed(10)
    my_array <- array(c(sample(0:9,32, replace=TRUE)), dim=c(4,4,2))
    
    lapply(seq_len(dim(my_array)[3]), 
           function(a) reduceMatrix(my_array[,,a], 2, 2))
    
    [[1]]
    
         [,1] [,2]
    [1,]  2.5  4.0
    [2,]  3.5  4.5
    
    [[2]]
         [,1] [,2]
    [1,] 4.00 5.25
    [2,] 5.25 3.75
    

    The idea of this approach is having a function that works either for stand alone matrices (in 3D arrays, lists, etc), and also an easier selection of number of rows (rown) and columns (coln) to be aggregated, as well as the applied function (mean, median, sum) and other arguments (e.g. na.rm).