Search code examples
rperformancedata.tablenormalization

Normalize each row of data.table


This seems like it should be easy, but I can't find an answer :(. I'm trying to normalize each row of a data_table like this:

normalize <- function(x) {
  s = sum(x)
  if (s>0) {
    return(x/s)
  } else {
    return 0
  }
}

How do I call this function on every row of a data.table and get a normalized data.table back? I can do a for loop, but that's surely not the right way, and apply(data, 1, normalize) will, as I understand, convert my data.table to a matrix which will be a big performance hit.


Solution

  • Here's one way to avoid coercing to a matrix:

    cols = names(DT)
    DT[, s := Reduce("+",.SD)]
    DT[s > 0, (cols) := lapply(.SD,"/",s), .SDcols = cols]
    DT[s <=  0, (cols) := 0]
    DT[, s := NULL]
    

    This is what I would do if there was a good reason to use a data.table over a matrix (in a later step).