Search code examples
rconditional-statementsgroupingdata-manipulationdata-management

R -- use binary df1 to group measurements in df2, then see if any measurements within each group meet a specific condition to output a new binary df3


I'm trying to translate some old Excel functions to R, and this final step in identifying insect outbreaks is the most challenging for me. Input will include two timeseries datasets: a binary dataset reporting outbreak or non-outbreak conditions over multiple trees, and an equivalent dataset of tree ring-width indices representing low or high growth for the same trees and same time scale.

I'll start by posting the Excel code and what it's attempting to do, and then R sample dfs underneath:

=IF((( IF( outbreak.year > prev.outbreak.year; SMALL( index.year : INDIRECT( ADDRESS(( ROW( outbreak.year) + (( MATCH( 0; outbreak.year : 100th.outbreak.ahead; 0)) - 1) - 1); COLUMN( index.year))); 1))) < 1.28); 1; IF( outbreak.year; prev.outbreak.year; 0))

outbreak.year and prev.outbreak.year and outbreak.ahead refer to a binary dataset indicating whether an outbreak is occurring or not; index.year refers to a dataset of equal size with tree ring growth indices.

This function should start by identifying whether an outbreak, 1, is initiating after a non-outbreak, 0, year. If it is, SMALL(), INDIRECT(), ADDRESS(), MATCH(), ROW(), and COLUMN() are all used in a sort of while loop to look ahead in the outbreak dataset until it stops reporting 1, then look at the smallest number in the equivalent 'group' of values in the index dataset and see if it's < 1.28. If it is, keep the group as 1s, if it's not, return the group as non-outbreak conditions, or 0s.

E.g., I'll have these as input:

df <- data.frame(t1 = c(0,0,0,1,1,1,1,1,0,0), t2 = c(0,0,0,0,0,1,1,1,1,1), t3 = c(0,0,1,1,1,1,1,1,1,0), t4 = c(0,0,1,1,1,1,1,0,0,1), t5 = c(0,1,1,1,1,1,0,0,1,1), row.names = 2000:2009)

df2 <- data.frame(t1 = c(0.12,0.54,-1.2,-0.3,-0.6,-1.29,-1.30,-0.5,0.3,0.5), t2 = c(0.9,0.8,0.32,0.9,-0.3,-0.4,-0.9,-1.1,-1.12,-1.14), t3 = c(-0.3,0.1,-1.11,-1.14,-1.45,-1.29,-1.68,-1.01,-0.6,0.1), t4 = c(-0.3,-0.34,-0.6,-0.9,-0.8,-1.1,-1.36,-0.4,0.5,0.3), t5 = c(1.45,-0.05,-0.12,-1.26,-0.21,-1.18,-1.01,-0.03,-0.6,-1.39), row.names = 2000:2009)

And I want df1 to look like df3 based on whether any equivalent [i,j]s within df2 were smaller than -1.28. Note that t2 and t5 columns lose reported outbreaks due to not having low enough growth, which is seen in df2:

           df1                         df2                                  df3
      t1 t2 t3 t4 t5             t1    t2    t3    t4    t5           t1 t2 t3 t4 t5
2000   0  0  0  0  0     2000  0.12  0.90 -0.30 -0.30  1.45     2000   0  0  0  0  0
2001   0  0  0  0  1     2001  0.54  0.80  0.10 -0.34 -0.05     2001   0  0  0  0  0
2002   0  0  1  1  1     2002 -1.20  0.32 -1.11 -0.60 -0.12     2002   0  0  1  1  0
2003   1  0  1  1  1     2003 -0.30  0.90 -1.14 -0.90 -1.26     2003   1  0  1  1  0
2004   1  0  1  1  1     2004 -0.60 -0.30 -1.45 -0.80 -0.21     2004   1  0  1  1  0
2005   1  1  1  1  1     2005 -1.29 -0.40 -1.29 -1.10 -1.18     2005   1  0  1  1  0
2006   1  1  1  1  0     2006 -1.30 -0.90 -1.68 -1.36 -1.01     2006   1  0  1  1  0
2007   1  1  1  0  0     2007 -0.50 -1.10 -1.01 -0.40 -0.03     2007   1  0  1  0  0
2008   0  1  1  0  1     2008  0.03 -1.12 -0.60  0.50 -0.60     2008   0  0  1  0  1
2009   0  1  0  0  1     2009  0.50 -1.14  0.10  0.30 -1.39     2009   0  0  0  0  1

It's hard to give examples on my progress, as I barely know where to start, or if I'm even working in the right direction. I'm currently starting at trying to make a while loop for a shift in df1, and have it shift + 1 lag until it hits a 0, but then I'm lost just staring at the entire (ugly) thing:

for( i in 1:dim( df1)[1]) {
  for( j in 1:dim( df1)[2]) {
    if( df1[i,j] > shift( df1, n = 1)) {
      n <- 1
      while( shift( df1, n = n) == 1) {
        shift( df1, n =+ 1)
        df3[i,j] <- 1
      } 
    } else { df3[i,j] <- 0 }
  }
}      

Thanks for any help you might have!


Solution

  • Here is how I would have asked your question. I have three data frames, A, B, and C. I need to convert A into C using values from B. Here are the data frames:

    A <- data.frame(c1=c(0, 1, 1, 0, 1, 1), c2=c(0, 1, 1, 1, 1, 0))
    B <- data.frame(c1=c(2, 2, 4, 4, 2, 3), c2=c(0, 2, 3, 4, 1, 4))
    C <- data.frame(c1=c(0, 1, 1, 0, 0, 0), c2=c(0, 1, 1, 1, 1, 0))
    

    Each series of sequential 1s in each column of A represents a group. I need to convert sequences in A to zero if there are no values in the corresponding values in B that are greater than 3. For example, the first group in A$c1 corresponds to the 2nd and 3rd values in that column. One of those values in B is greater than 3, so I keep that group. The second group in A$c1 corresponds to the 5th and 6th values, but neither of those is greater than 3 so I don't keep the values in that group.


    An answer:

    # Generate IDs for each sequence of 1s or 0s in each column of A
    
    A.splits <- lapply(A, function(x) cumsum(c(0, abs(diff(x)))))
    
    # Loop through each column, and split the values of B by the
    # groups in A.  If any group in any column contains values
    # greater than 3, then return ones for that group else zero
    
    A.keep <- Map(
      ave, B, A.splits, MoreArgs=list(FUN=function(x) !!any(x > 3))
    )
    # remove ones by multiplying each column of `A` against `A.keep`
    # and confirm results are the same as expected
    
    all.equal(A * A.keep, C)
    ## TRUE