row-wise first/last occurrences from column series in data.table

As a follow-up to this oldie but goodie: efficient row-wise operations on a data.table

I have some data that (unfortunately) look like:

library('data.table')
set.seed(1234)
m <- 5
n <- 7
rb <- function() runif(m,1000,2000) * rbinom(m,1,0.5)   
series_col_nms <- paste0('YearNo',1:n)    
rev <- data.table(cust_id = paste0('CustNo',1:m), 
                other_stuff = sample(letters,m, replace=TRUE))
for(col in series_col_nms){
  set(rev, j=col, value=rb())
}
setkey(rev, cust_id)

One row per customer with various columns including yearly revenue for year 1, 2, ...

I would like to get the year indices for the first and last years with any revenue from each customer.

I can produce the desired result, but with a somewhat hacky join:

years_active <- rev[, which(.SD>0), .SDcols = series_col_nms, 
                    keyby=cust_id][, .(min_year_active = min(V1),
                                       max_year_active = max(V1)), keyby=cust_id]
years_active[rev]

These attempts to get the minimum index fail:

rev[, apply(.SD, 1, function(x) min(which(x>0))), .SDcols=series_col_nms, by=cust_id] # returns data type error    
rev[, do.call(pmin, lapply(.SD, function(x) which(x>0))), .SDcols=series_col_nms, by=cust_id] # returns empty

What is the data.table way to do this?

Solution

When you want to operate on many columns by row, the common practice it to melt your data set first and then operate on a single column.

In your case, a relatively straightforward solution could be something like

res <- melt(rev, id = 1:2)[, 
         as.list({
            temp <- value != 0
            if (any(temp)) range(which(temp)) else rep(NA_integer_, 2)
         }), 
        by = cust_id]

rev[, c("Min", "Max") := res[, .(V1, V2)]]
rev
#    cust_id other_stuff  YearNo1 YearNo2  YearNo3  YearNo4  YearNo5  YearNo6  YearNo7 Min Max
# 1: CustNo1           c 1640.311       0    0.000 1759.671    0.000 1503.933    0.000   1   6
# 2: CustNo2           q 1009.496       0    0.000 1201.248    0.000    0.000 1308.095   1   7
# 3: CustNo3           p    0.000       0    0.000    0.000 1484.991    0.000    0.000   5   5
# 4: CustNo4           q 1666.084       0 1831.345 1992.150 1243.929    0.000 1051.647   1   7
# 5: CustNo5           w    0.000       0    0.000    0.000    0.000    0.000    0.000  NA  NA

A cleaner version but with warnings could be

melt(rev, id = 1:2)[, as.list(as.integer(range(which(value != 0)))), by = cust_id]