Search code examples
rlistvectordata.table

Assign list of character vectors to new data.table column in R


I'm trying to parse a string with a delimiter with every pair of substrings split by the delimiter. Once I have that I want to assign the resulting list of character vector pairs to a new column in a data table, but currently I'm running into an error. here should be a MWE:

library(stringr)
decompose_string_pairs <- function(n_string = ""){
  n <- str_count(n_string,"\\|")[[1]] + 1
  string_pairs <- list()
  
  for(i in n:2){
    n1 <- str_split(n_string, "\\|")[[1]][i]
    for(j in (i-1):1){
      n2 <- str_split(n_string,"\\|")[[1]][j]
      temp_pair <- list(c(n1,n2))
      string_pairs <- c(string_pairs,temp_pair)
    }
  }
  return(string_pairs)
}
test <- decompose_string_pairs("1|2|3")
test

which returns the expected list of character vectors:

> test
[[1]]
[1] "3" "2"

[[2]]
[1] "3" "1"

[[3]]
[1] "2" "1"

Where I'm running into trouble is then assigning this result to a new column in a data table:

library(data.table)
dt <- data.table(strings <- c("1|2|3","1|4"))
dt[, parsed_list := decompose_string_pairs(strings)]

But on running this I get:

Supplied 3 items to be assigned to 2 items of column 'parsed_list'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

yet when I try to modify the dt assignment with:

dt[, parsed_list := rep(decompose_string_pairs(strings))]

or

dt[, parsed_list := rep_len(decompose_string_pairs(strings))]

The assignment doesn't work and I get the same error for the first and

Error in rep_len(decompose_string_pairs(strings)) : 
  argument "length.out" is missing, with no default

for the second.

How can I store lists in columns of a dt? I know it's possible but I can't figure out the structure I need at the moment.


Solution

  • The problem is your function isn't vectorized. It only expectes one value at a time. If you run

    decompose_string_pairs(c("1|2|3","1|4"))
    # [[1]]
    # [1] "3" "2"
    # [[2]]
    # [1] "3" "1"
    # [[3]]
    # [1] "2" "1"
    

    you see you only get the first output. If you vectorize your function, it would work

    decompose_string_pairs_vec <- Vectorize(decompose_string_pairs)
    dt[, parsed_list := decompose_string_pairs_vec(strings)]
    dt
    #       V1 parsed_list
    # 1: 1|2|3   <list[3]>
    # 2:   1|4   <list[1]>