I'm trying to parse a string with a delimiter with every pair of substrings split by the delimiter. Once I have that I want to assign the resulting list of character vector pairs to a new column in a data table, but currently I'm running into an error. here should be a MWE:
library(stringr)
decompose_string_pairs <- function(n_string = ""){
n <- str_count(n_string,"\\|")[[1]] + 1
string_pairs <- list()
for(i in n:2){
n1 <- str_split(n_string, "\\|")[[1]][i]
for(j in (i-1):1){
n2 <- str_split(n_string,"\\|")[[1]][j]
temp_pair <- list(c(n1,n2))
string_pairs <- c(string_pairs,temp_pair)
}
}
return(string_pairs)
}
test <- decompose_string_pairs("1|2|3")
test
which returns the expected list of character vectors:
> test
[[1]]
[1] "3" "2"
[[2]]
[1] "3" "1"
[[3]]
[1] "2" "1"
Where I'm running into trouble is then assigning this result to a new column in a data table:
library(data.table)
dt <- data.table(strings <- c("1|2|3","1|4"))
dt[, parsed_list := decompose_string_pairs(strings)]
But on running this I get:
Supplied 3 items to be assigned to 2 items of column 'parsed_list'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
yet when I try to modify the dt
assignment with:
dt[, parsed_list := rep(decompose_string_pairs(strings))]
or
dt[, parsed_list := rep_len(decompose_string_pairs(strings))]
The assignment doesn't work and I get the same error for the first and
Error in rep_len(decompose_string_pairs(strings)) :
argument "length.out" is missing, with no default
for the second.
How can I store lists in columns of a dt? I know it's possible but I can't figure out the structure I need at the moment.
The problem is your function isn't vectorized. It only expectes one value at a time. If you run
decompose_string_pairs(c("1|2|3","1|4"))
# [[1]]
# [1] "3" "2"
# [[2]]
# [1] "3" "1"
# [[3]]
# [1] "2" "1"
you see you only get the first output. If you vectorize your function, it would work
decompose_string_pairs_vec <- Vectorize(decompose_string_pairs)
dt[, parsed_list := decompose_string_pairs_vec(strings)]
dt
# V1 parsed_list
# 1: 1|2|3 <list[3]>
# 2: 1|4 <list[1]>