Search code examples
rconcatenationdelimitersplitstackshape

R cSplit only using first delimiter in string


I had a long list with two columns where the I had the same string in each column in multiple rows. So I used paste to concatenate using - and then used setDT to return the unique set of concats with their frequency.

Now I want to reverse my concatenation.

I tried:

library(splitstackshape)
d5 <- cSplit(d4, 'conc', '-', 'wide')

However in my second column I sometimes had multiple -'s within the string.

To get around this I'd like cSplit to ONLY use the first - delimiter.

Example:

 conc      freq
 A-hello      4
 A-Hi-there   5
 B-HELLO      1

Using the above cSplit would return:

freq conc_001  conc_002  conc_003
   4        A     hello        NA
   5        A        Hi     there
   1        B     HELLO        NA

I would like:

freq conc_001  conc_002
   4        A     hello
   5        A  Hi-there
   1        B     HELLO

Solution

  • Here is another idea.By using sub we restrict it to only change the first specified delimeter of the string. We then use cSplit with the new delimeter.

    library(splitstackshape)
    df$conc <- sub('-', ' ', df$conc)
    cSplit(df, 'conc', ' ', 'wide')
    #   freq conc_1   conc_2
    #1:    4      A    hello
    #2:    5      A Hi-there
    #3:    1      B    HELLO