Search code examples
rdata.tablepastesplit-apply-combine

Efficient way to paste multiple column pairs in R data.table


I'm looking for an efficient way to paste/combine multiple pairs of adjacent columns at once using data.table. My feeble attempt is slow and not so elegant:

library(data.table)
dt <- data.table(ids = 1:3,
                 x1 = c("A","B","C"),
                 x2 = 1:3,
                 y1 = c("D", "E", "F"),
                 y2 = 4:6,
                 z1 = c("G", "H", "I"),
                 z3 = 7:9)

paste.pairs <- function(x, sep = "-"){
  xx <- unlist(x)
  x.len <- length(x)

  r <- rep(NA, x.len/2)
  s <- seq(1, x.len, by = 2)

  for(i in 1:(x.len/2)) {
    r[i] <- paste(xx[i], xx[i+1], sep = sep)
  }
  return(as.list(r))
}

dt[, paste.pairs(.SD), by = "ids"]

Is there a better way?


Solution

  • An option with Map by creating column index with seq

    i1 <- seq(1, length(dt)-1, 2)
    i2 <- seq(2, length(dt)-1, 2)
    dt[, Map(paste,
             .SD[, i1, with = FALSE], .SD[, i2, with = FALSE], 
             MoreArgs = list(sep="-")), 
       by = "ids"]
    

    Another option would be to split by the names of the dataset and then paste

    data.frame(lapply(split.default(dt[, -1, with = FALSE],
        sub("\\d+$", "", names(dt)[-1])), function(x) do.call(paste, c(x, sep="-"))))
    #  x   y   z
    #1 A-1 D-4 G-7
    #2 B-2 E-5 H-8
    #3 C-3 F-6 I-9
    

    Or another option is with melt/dcast

    dcast(melt(dt, id.var = 'ids')[,  paste(value, collapse = "-"),
      .(grp = sub("\\d+", "", variable), ids)], ids ~ grp, value.var = 'V1')