Search code examples
rcoalesce

How to implement coalesce efficiently in R


Background

Several SQL languages (I mostly use postgreSQL) have a function called coalesce which returns the first non null column element for each row. This can be very efficient to use when tables have a lot of NULL elements in them.

I encounter this in a lot of scenarios in R as well when dealing with not so structured data which has a lot of NA's in them.

I have made a naive implementation myself but it is ridiculously slow.

coalesce <- function(...) {
  apply(cbind(...), 1, function(x) {
          x[which(!is.na(x))[1]]
        })
}

Example

a <- c(1,  2,  NA, 4, NA)
b <- c(NA, NA, NA, 5, 6)
c <- c(7,  8,  NA, 9, 10)
coalesce(a,b,c)
# [1]  1  2 NA  4  6

Question

Is there any efficient way to implement coalesce in R?


Solution

  • On my machine, using Reduce gets a 5x performance improvement:

    coalesce2 <- function(...) {
      Reduce(function(x, y) {
        i <- which(is.na(x))
        x[i] <- y[i]
        x},
      list(...))
    }
    
    > microbenchmark(coalesce(a,b,c),coalesce2(a,b,c))
    Unit: microseconds
                   expr    min       lq   median       uq     max neval
      coalesce(a, b, c) 97.669 100.7950 102.0120 103.0505 243.438   100
     coalesce2(a, b, c) 19.601  21.4055  22.8835  23.8315  45.419   100