Search code examples
rapply

Build columns based on specific conditions of data frame rows


I would like to create code in R to capture the name of the column where the value 1 appears for the first time per line.

Later, given that the value 1 is present in the row in question, I would like to capture the name of the column where the value 2 last appears before the value 1.

In a practical way, in possession of the following set of data (example):

structure(list(id = c(1, 2, 3, 4, 5, 6), `2004` = c(2, 2, 0,
2, 2, 0), `2005` = c(2, 2, 0, 2, 0, 0), `2006` = c(2, 2, 0, 2,
2, 0), `2007` = c(2, 2, 0, 2, 2, 1), `2008` = c(2, 2, 0, 2, 2,
0), `2009` = c(2, 2, 0, 2, 0, 1), `2010` = c(2, 2, 0, 2, 0, 0
), `2011` = c(0, 0, 0, 1, 0, 0), `2012` = c(0, 0, 1, 1, 0, 1),
     `2013` = c(0, 0, 0, 1, 0, 0), `2014` = c(0, 0, 0, 1, 0, 1
     ), `2015` = c(1, 0, 0, 1, 0, 0), `2016` = c(0, 0, 0, 1, 0,
     0), `2017` = c(0, 0, 0, 1, 0, 1), `2018` = c(0, 0, 0, 1,
     0, 1), `2019` = c(0, 0, 0, 1, 0, 0)), row.names = c(NA, -6L
), class = c("data.table", "data.frame"))

I need to get something like:

structure(list(id = c(1, 2, 3, 4, 5, 6), `2004` = c(2, 2, 0,
2, 2, 2), `2005` = c(2, 2, 0, 2, 0, 0), `2006` = c(2, 2, 0, 2,
2, 0), `2007` = c(2, 2, 0, 2, 2, 1), `2008` = c(2, 2, 0, 2, 2,
0), `2009` = c(2, 2, 0, 2, 0, 1), `2010` = c(2, 2, 0, 2, 0, 0
), `2011` = c(0, 0, 0, 1, 0, 0), `2012` = c(0, 0, 1, 1, 0, 1),
     `2013` = c(0, 0, 0, 1, 0, 0), `2014` = c(0, 0, 0, 1, 0, 1
     ), `2015` = c(1, 0, 0, 1, 0, 0), `2016` = c(0, 0, 0, 1, 0,
     0), `2017` = c(0, 0, 0, 1, 0, 1), `2018` = c(0, 0, 0, 1,
     0, 1), `2019` = c(0, 0, 0, 1, 0, 0), situation1 = c(2015,
     NA, 2012, 2011, NA, 2007), situation2 = c(2010, NA, NA, 2010,
     NA, 2004)), row.names = c(NA, -6L), class = c("data.table",
"data.frame"))

I tried some possibilities, like using apply per line, but I wasn't successful, thanks for any help.


Solution

  • Solution with base R only:

    d1$situation1 <- apply(d1[,-1], 1, \(x) as.numeric(names(which(x == 1)[1])))
    d1$situation2 <- apply(d1[,-1], 1, \(x) {
      x <- x[1:16][as.numeric(names(x[1:16])) < as.numeric(x[17])]
      as.numeric(names(which(rev(x) == 2)[1]))
    })
    
    > d1
      id 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 situation1 situation2
    1  1    2    2    2    2    2    2    2    0    0    0    0    1    0    0    0    0       2015       2010
    2  2    2    2    2    2    2    2    2    0    0    0    0    0    0    0    0    0         NA         NA
    3  3    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0       2012         NA
    4  4    2    2    2    2    2    2    2    1    1    1    1    1    1    1    1    1       2011       2010
    5  5    2    0    2    2    2    0    0    0    0    0    0    0    0    0    0    0         NA         NA
    6  6    0    0    0    1    0    1    0    0    1    0    1    0    0    1    1    0       2007         NA