Search code examples
rregexdata.tablestrsplit

Split column by multiple delimiters, keeping delimiters


How can I split a character column into 3 columns using %, -, and + as the possible delimiters, keeping the delimiters in the new columns?

Example Data:

data <- data.table(x=c("92.1%+100-200","90.4%-1000+200", "92.8%-200+100", "99.2%-500-200","90.1%+500-200"))

Example desired data:

data.desired <- data.table(x1=c("92.1%", "90.4%", "92.8%","99.2%","90.1%")
                           , x2=c("+100","-1000","-200","-500","+500")
                           , x3=c("-200","+200","+100","-200","-200"))

Happy to award the points for a good answer and some help on this one!


Solution

  • In data.table the equivalent is tstrsplit:

    data[, c("x1","x2","x3") := tstrsplit(x, "(?<=.)(?=[+-])", perl=TRUE) ]
    data
    #                x    x1    x2   x3
    #1:  92.1%+100-200 92.1%  +100 -200
    #2: 90.4%-1000+200 90.4% -1000 +200
    #3:  92.8%-200+100 92.8%  -200 +100
    #4:  99.2%-500-200 99.2%  -500 -200
    #5:  90.1%+500-200 90.1%  +500 -200