Search code examples
rdataframetidyrstringr

Separate string into many columns


I'd like to separate each letter or symbol in a string for composing a new data.frame with dimension equals the number of letters. I tried to use the function separate from tidyr package, but the result is not desired.

df <- data.frame(x = c('house', 'mouse'), y = c('count', 'apple'), stringsAsFactors = F)

#unexpected result df[1, ] %>% separate(x, c('A1', 'A2', 'A3', 'A4', 'A5'), sep ='') A1 A2 A3 A4 A5 y 1 count

Expected output

A1  A2  A3  A4  A5
 h   o   u   s   e
 m   o   u   s   e

Solutions using stringr are welcome.


Solution

  • We can use regex lookaround in sep to match the boundary between each character

    library(dplyr)
    library(tidyr)
    library(stringr)
    df %>%
       select(x) %>% 
       separate(x, into = str_c("A", 1:5), sep= "(?<=[a-z])(?=[a-z])")
    #  A1 A2 A3 A4 A5
    #1  h  o  u  s  e
    #2  m  o  u  s  e