I'm working on a project that is looking at regrowth of trees after a deforestation event. To simplify the data set for this question, I have a matrix (converted from data frame), which has 10 columns corresponding to years 2001-2010.
-1 indicates a change point in the data, when a previously forested plot was deforested. 1 indicated when a previously deforested region became forested. 0's indicate no change in state.
I found this link which I think does what I need to do, except in python/c++. Since I did the rest of my analyses in R, I want to stick with it.
So I was trying to translate some of the code to R, but I've been having problems.
This is my sample data set. One of my alternative thoughts is that if I could identify the index of (-1) and then the index of 1, then I could subtract these two indices to get the difference (and then subtract 1 to account for factoring in the first index in the subtraction)
# Example data
head(tcc_change)
id 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
1 1 0 0 0 0 0 -1 0 0 1 0
2 2 0 0 0 -1 0 0 1 0 0 0
3 3 0 0 0 -1 0 0 0 1 0 0
4 4 0 -1 0 0 0 0 1 0 0 0
5 5 0 0 0 1 0 0 -1 1 0 0
# Indexing attempt
tcc_change$loss_init <- apply(tcc_change, 1, function(x) match(-1, x[1:10], nomatch = 99))
tcc_change$gain <- apply(tcc_change, 1, function(x) match(1, x[1:10], nomatch=99))
This method has a lot of problems though. What if there's a 1 before a (-1), for example. I'd like to figure out a better way to do this analysis, similar to the logical structure in the link above, but I don't know how to do this in R.
Ideally I'd like to identify points where there was deforestation (-1) and then regrowth (1) and then count the zeroes in between. The number of zeroes in between would be posted to a new column. This would give me a better idea of how long it takes for a plot to become forested after a deforestation event. If there are no zeroes in between (like row 5), I would want the code to output '0'.
Sorry my function may only handle simple case. Hope that helps.
First your code has some issues that when you search index, you include the id
column as well (in x[1:10]
). if you want to exclude that, can use x[-1]
to exclude the first column, but the index will count from 2nd ones.
tcc_change$loss_init <- apply(tcc_change, 1, function(x) match(-1, x[1:10], nomatch = 99))
tcc_change$gain <- apply(tcc_change, 1, function(x) match(1, x[1:10], nomatch=99))
I adjusted your approach and first to get the -1
index, then use match
again to search index of 1
starting from the index of -1
; then once I found that, can just minus 1 to get the number of intervals:
get_interval = function(x){
init = match(-1, x[-1])
interval = match(1, x[-(1:(init+1))]) - 1
return(interval)
}
> apply(tcc_change, 1, get_interval)
[1] 2 2 3 4 0
Hope that helps.