Search code examples
rregextidyr

Tidyr Separate using regex


I searched and searched for this and found similar stuff but nothing quite right. Hopefully this hasn't been answered.

Lets say I have a column with Y,N, and sometimes extra information

    df<-data.frame(Names=c("Patient1","patient2","Patient3","Patient4","patient5"),Surgery=c("Y","N","Y-this kind of surgery","See note","Y"))

And I'm trying to separate out the Y or N into one column, and everything else from that column into another.

I've tried

    df%>%separate('Surgery',c("Surgery","Notes"), sep=" ")

Will end up with a column that has "see", next column has "notes"

    df%>%separate('Surgery',c("Surgery","Notes"), sep = '^Y|^N')

Just gets weird

    df%>%separate('Surgery',c("Surgery","Notes), sep= "^[YN]?")

Splits notes correctly, removes Y and N.

Anybody know how to separate it? The result I'm looking for would have only Y or N in the surgery column and anything else pushed to a different column.


Solution

  • We can use extract from tidyr

    library(tidyr)
    library(dplyr)
    df %>% 
      extract(Surgery, into = c("Surgery", "Notes"), "^([YN]*)[[:punct:]]*(.*)")
    #     Names Surgery                Notes
    #1 Patient1       Y                     
    #2 patient2       N                     
    #3 Patient3       Y this kind of surgery
    #4 Patient4                     See note
    #5 patient5       Y