Search code examples
rtidyverseextract

Columns with participants ID's and cues: separating them into two different columns in R


I am working with R and I have a data that looks like this...

A              B
p1___box      11

p2___cage     12

p3___chair    13

p1___sofa     14

p3___desk     15

p1___garage   18

p2___house    07

p2___building 19

p3___street   21

And I need to separate them so they look like this

A      B       C
p1___  box     11

p2___  cage    12

p3___  chair   13

p1___  sofa    14

p3___  desk    15

p1___  garage  18

p2___  house   07

p2___  building 19

p3___  street  21

I am trying to use the extract function that it is within tydiverse but I really cannot find a way of correctly use it.


Solution

    1. You can use tidyr's extract to divide A column into 2 columns.
    tidyr::extract(df, A, c('col1', 'col2'), '(.*?)_+(.*)')
    
    #  col1     col2  B
    #1   p1      box 11
    #2   p2     cage 12
    #3   p3    chair 13
    #4   p1     sofa 14
    #5   p3     desk 15
    #6   p1   garage 18
    #7   p2    house  7
    #8   p2 building 19
    #9   p3   street 21
    
    1. using str_match from stringr :
    cbind(df[2], stringr::str_match(df$A,"(.*?)_+(.*)")[, -1])
    
    1. Base R option with sub :
    transform(df, col1 = sub('_.*', '', A), col2 = sub('.*_', '', A))