I have som data in a data.table that I want to to sum. Trivial example supplied below. I start with (variable) n-columns (char) and some group variables.
I can convert them (nsk-vars in my example) to numeric/integer beforehand, if that's the way to go. I would like to be able to use a string of columnames to create a new column which is the sum of these(nsk1-nsk3) and handle NA:s. How do I do this in a resonable way?
library(data.table)
x <- data.table(a = c("GrpA", "GrpB"),
nsk1 = c("1","3"),
nsk2 = c(NA,"1"),
nsk3 = c("3", "4"))
ClNamesStr <- colnames(x)
ClNamesStr <- ClNamesStr[grepl("^nsk", ClNamesStr)]
nskStrSum <- paste(ClNamesStr, collapse = "),as.numeric(")
nskStrSum <- paste("as.numeric(", nskStrSum, "), na.rm = TRUE")
This is the part that I would like to get working.
x[, nsk0 := sum(eval(nskStrSum)), by = a]
The desired output would be this:
a nsk1 nsk2 nsk3 nsk0
GrpA 1 NA 3 4
GrpB 3 1 4 8
Here is one way to go.
x[, nsk0 := rowSums(sapply(.SD, function(i) as.numeric(i)), na.rm=TRUE),
.SDcols=grep("^nsk", names(x)), by = a]
which returns
x
a nsk1 nsk2 nsk3 othr nsk0
1: GrpA 1 NA 3 a 4
2: GrpA 3 1 4 b 8
sapply
runs through the selected variables, converts each to a numeric and returns a matrix. This matrix is passed to rowSums
which sums up the values in each row and ignores the NA
s. .SD
is a shorthand to select of the data.table, (except variables contained in "by"). However, the variables are selected among them using .SDcols
along with grep
.
Some data.tablers might not be happy with this solution as it converts some of the data into a matrix. However, this conversion may be necessary in your case.
I added an additional "nuisance" variable to show that it will work.
data
x <- data.table(a = c("GrpA", "GrpA"),
nsk1 = c("1","3"),
nsk2 = c(NA,"1"),
nsk3 = c("3", "4"),
othr = letters[1:2])