Search code examples
rstringi

R: split string vector by delimiter and rearrange


I have string vector that needs to be split and rearranged in a matrix in a certain way. I know how to do split/simple rearrange, but lost how my to rearrange how I want:

library(stringi)

vec = c("b;a;c","a;c","c;b")
q = stri_split_fixed(vec, ";", simplify = TRUE,fill=T)
View(q)

V1  V2  V3
b   a   c
a   c    
c   b    

Desired output

V1  V2  V3
a   b   c
a       c 
    b   c 

Thank you! EDIT:

Letters above are for simplicity. Real options are (not exhaustive list): D-Amazon Marketplace, U-Amazon, D-Amazon, U-Jet, etc. Starts with U and D only, though.

Order - alphabetical but grouped by retailer. If too complicated - no order is OK


Solution

  • This solution generates a boolean matrix with each vector as a row, and each possible character as a column.

    possible_options = c('a', 'b', 'c')
    result <- sapply(possible_options, function(x) apply(q, 1, function(y) x %in% y))
    result
             a     b    c
    [1,]  TRUE  TRUE TRUE
    [2,]  TRUE FALSE TRUE
    [3,] FALSE  TRUE TRUE
    

    This solution requires a list of all the options. If you don't have that, you can either make a list of all possible options (for example all alphanumeric characters) and then remove blank rows:

    result <- sapply(c(letters, LETTERS), function(x) apply(q, 1, function(y) x %in% y))
    result <- result[, colSums(result) > 0]
    result
             a     b    c
    [1,]  TRUE  TRUE TRUE
    [2,]  TRUE FALSE TRUE
    [3,] FALSE  TRUE TRUE
    

    Or extract them from the result of q

    opts <- as.character(unique(unlist(q)))
    opts <- opts[sort.list(opts[opts != ''])]
    result <- sapply(opts , function(x) apply(q, 1, function(y) x %in% y))
    result
             a     b    c
    [1,]  TRUE  TRUE TRUE
    [2,]  TRUE FALSE TRUE
    [3,] FALSE  TRUE TRUE