Search code examples
rmatrixdetection

R - detect and summarize changes in matrices


I have two sets of matrices. Each matrix is 100x100 in dimension and I have 240 of them (imagine each matrix was collected in a month and I have a dataset composed of 240 months of 100x100 matrices).

The values in the matrices range from 1 to 15, representing vegetation types (grass, tropical forest, tundra etc).

My first set of matrices, m1, is my control experiment. My second set of matrices, m2, is a climate change experiment where changes in climate induce changes in the values of the matrices.

Therefore, the data is represented like this:

m1: set of 240 100x100 matrices, each matrix corresponding to a month (therefore 240 months of data). This is my control data

m2: same as m1, but the values are different because of some changes in climate. This is my experimental data.

Here is some data:

# generate dataset 1
set.seed(4)
someData1 <- round(runif(100 * 100 * 240, min=1, max=15),digits=0)

# generate dataset2
set.seed(5)
someData2 <- round(runif(100 * 100 * 240, min=1, max=15),digits=0)

# create matrices
k = 240; n=100; m = 100
m1 <- array(someData1, c(n,m,k))
m2 <- array(someData2, c(n,m,k))

What I would like to do is compare each cell of m2 relative to m1 in this way:

  • is the value different? yes/no
  • if yes, what was the change? for example 1 to 10, or 2 to 7 and so on.

and do the same for all 240 matrices in m2 relative to all 240 matrices in m1.

By the end, I would like to be able to:

  • have a binary matrix showing whether or not there has been changes in the values;
  • have a table with the frequency of changes in each class (i.e. 1 to 10, 2 to 7 etc).

Conceptually, what I need to achieve would be something like this:

enter image description here

where for simplicity sake I drew 5x5 matrices instead of 100x100 matrices.

How to achieve this in R?


Solution

  • To compare two matrices, use == or !=.

    what.changed <- m1 != m2 # T if changed F if not
    changes <- ifelse(what.changed, paste(m1, 'to', m2), NA)
    changes # for your little matrices not the 100x100
         [,1]     [,2]      [,3]    
    [1,] NA       "7 to 10" "6 to 7"
    [2,] NA       NA        NA      
    [3,] "3 to 4" "6 to 8"  NA      
    

    Your matrices seem rather large, so I'm not sure if some sort of sparse matrix approach might be better. In regards to storing the changes as a string ("3 to 4"), perhaps you could only store changes where there is in fact a change, rather than creating such a large matrix where most of the elements are NA. e.g.

    Or perhaps you could create a CSV/dataframe summarising your changes e.g. (using your 100x100x240 matrices to demonstrate the 3 coordinates):

    # find coordinates of changes
    change.coords <- which(m1 != m2, arr.ind=T)
    colnames(change.coords) <- c('x', 'y', 'time') # whatever makes sense to your application
    changes <- data.frame(change.coords, old=m1[change.coords], new=m2[change.coords])
    head(changes)
      x y time old new
    1 1 1    1   9   4
    2 2 1    1   1  11
    3 3 1    1   5  14
    4 5 1    1  12   2
    5 6 1    1   5  11
    6 7 1    1  11   8
    

    Then you can print it out as you wish without having to store heaps of strings ("X to Y") and NAs, e.g (don't do this with your big example matrices, there are waaay too many changes and it will print them /all/):

    with(changes, message(sprintf("Coords (%i, %i, %i): %i to %i\n", 
           x, y, time, old, new)))