Search code examples
rindexingquerying

How to do queries for counts of matrix elements with values in given range


I'm working on a project that is looking at regrowth of trees after a deforestation event. To simplify the data set for this question, I have a matrix (converted from data frame), which has 10 columns corresponding to years 2001-2010.

-1 indicates a change point in the data, when a previously forested plot was deforested. 1 indicated when a previously deforested region became forested. 0's indicate no change in state.

I found this link which I think does what I need to do, except in python/c++. Since I did the rest of my analyses in R, I want to stick with it.

So I was trying to translate some of the code to R, but I've been having problems.

This is my sample data set. One of my alternative thoughts is that if I could identify the index of (-1) and then the index of 1, then I could subtract these two indices to get the difference (and then subtract 1 to account for factoring in the first index in the subtraction)

# Example data
head(tcc_change)

  id   2001  2002  2003  2004  2005  2006  2007  2008 2009  2010  
1  1      0     0     0     0     0    -1     0     0    1    0   
2  2      0     0     0    -1     0     0     1     0    0    0     
3  3      0     0     0    -1     0     0     0     1    0    0  
4  4      0    -1     0     0     0     0     1     0    0    0   
5  5      0     0     0     1     0     0    -1     1    0    0 

# Indexing attempt
tcc_change$loss_init <- apply(tcc_change, 1, function(x) match(-1, x[1:10], nomatch = 99)) 
tcc_change$gain <- apply(tcc_change, 1, function(x) match(1, x[1:10], nomatch=99))

This method has a lot of problems though. What if there's a 1 before a (-1), for example. I'd like to figure out a better way to do this analysis, similar to the logical structure in the link above, but I don't know how to do this in R.

Ideally I'd like to identify points where there was deforestation (-1) and then regrowth (1) and then count the zeroes in between. The number of zeroes in between would be posted to a new column. This would give me a better idea of how long it takes for a plot to become forested after a deforestation event. If there are no zeroes in between (like row 5), I would want the code to output '0'.


Solution

  • Sorry my function may only handle simple case. Hope that helps. First your code has some issues that when you search index, you include the id column as well (in x[1:10]). if you want to exclude that, can use x[-1] to exclude the first column, but the index will count from 2nd ones.

    tcc_change$loss_init <- apply(tcc_change, 1, function(x) match(-1, x[1:10], nomatch = 99)) 
    tcc_change$gain <- apply(tcc_change, 1, function(x) match(1, x[1:10], nomatch=99))
    

    I adjusted your approach and first to get the -1 index, then use match again to search index of 1 starting from the index of -1; then once I found that, can just minus 1 to get the number of intervals:

    get_interval = function(x){
      init = match(-1, x[-1])
      interval = match(1, x[-(1:(init+1))]) - 1
      return(interval)
    }
    
    > apply(tcc_change, 1, get_interval)
    [1] 2 2 3 4 0
    

    Hope that helps.