I am currently trying to separate a column that contains a range of value (as character) into two numeric columns to compute the mean of them if a row has a range. Then, I want to replace the old column with the updated result one. Actually, there are multiple columns to be done.. I tried to find a way, but it seems challenging to me.
Below is the code that I have tried.. which does not work..
test.val <- data.table(id = c(1, 2, 3),
colA = c("100-150", "200", "300"),
colB = c("15", "20-30", "10"))
test.A <- test.val[, lapply(.SD, function(x){strsplit(x, split = "-")}), .SDcols = c("colA", "colB")]
test.B[, lapply(.SD, mean), .SDcols = c("colA", "colB")]
At the end, I'd like to have the following:
id colA colB
1: 1 125 15
2: 2 200 25
3: 3 300 10
Anyone that can help me out? Many thanks.
Another option using data.table
library(data.table)
cols <- c("colA", "colB")
for(j in cols) {
tmp <- vapply(strsplit(test.val[[j]], "-"),
FUN = function(i) mean(as.numeric(i)),
FUN.VALUE = numeric(1))
set(test.val, j = j, value = tmp)
}
test.val
# id colA colB
#1: 1 125 15
#2: 2 200 25
#3: 3 300 10
Given a vector
x <- c("100-150", "200", "300")
the result of strsplit
is a list of character vectors
strsplit(x, "-")
#[[1]]
#[1] "100" "150"
#[[2]]
#[1] "200"
#[[3]]
#[1] "300"
We wrap this into vapply
and calculate the mean for each element after we converted each vector to numeric.
vapply(strsplit(x, "-"), function(x) mean(as.numeric(x)), numeric(1))
# [1] 125 200 300
We use this result to replace every column specified in cols
using data.table
's set
function.