Search code examples
rdataframesubstring

R: Extracting characters from string vectors to data frame rows


I have a data frame which has one column, a list of words. I'd like to extract the characters from each word and have it stored as a position column in the data frame.

For example, if the dataframe is defined like this:

words <- c('which', 'there', 'their', 'would') 
words <- as.data.frame(words)  

I'd like it to look like this at the end:

words first_pos second_pos third_pos fourth_pos fifth_pos
which w h i c h
there t h e r e
their t h e i r
would w o u l d

What I have so far is:

position <- c("first_pos", "second_pos", "third_pos", "fourth_pos", "fifth_pos")
words[position] <- NA
dismantled <- str_split(words$words,"")

This dismantles the words and creates the columns I need. However, I could use some help filling the rows of the columns with the letters.


Solution

  • We could use separate after a space between each character in words:

    library(tidyverse)
    words %>%
      mutate(words1 =  sub("\\s+$", "", gsub('(.{1})', '\\1 ', words))) %>% 
      separate(words1, into = paste0(1:5, "_pos"))
    
      words 1_pos 2_pos 3_pos 4_pos 5_pos
    1 which     w     h     i     c     h
    2 there     t     h     e     r     e
    3 their     t     h     e     i     r
    4 would     w     o     u     l     d