Search code examples
rstringsplitstrsplit

R: strsplit based on two conditions, keeping deliminator


I am trying to split sentences based on different criteria. I am looking to split some sentences after "traction" and some after "ramasse". I looked up the grammar rules for grepl but didn't really understand.

A data frame called export has a column ref, which has str values ending either with "traction" or "ramasse".

>export$ref
                        ref
[1] "62133130_074_traction"
[2]  "62156438_074_ramasse"
[3]  "62153874_070_ramasse"
[4] "62138861_074_traction"

And I want to split str values in ref column into two.

                ref           R&T
[1] "62133130_074_"    "traction"
[2] "62156438_074_"     "ramasse"
[3]  "62153874_070_"    "ramasse"
[4] "62138861_074_"    "traction"

What I tried(none of them was good)

strsplit(export$ref, c("traction", "ramasse"))
strsplit(export$ref, "\\_(?<=\\btraction)|\\_(?<=\\bramasse)", perl = TRUE)
strsplit(export$ref, "(?=['traction''ramasse'])", perl = TRUE)

Any help would be appreciated!


Solution

  • Here is another option using stringr::str_split:

    library(stringr);
    str_split(ref, pattern = "_(?=[A-Za-z]+)", simplify = T)
    #    [,1]           [,2]
    #[1,] "62133130_074" "traction"
    #[2,] "62156438_074" "ramasse"
    #[3,] "62153874_070" "ramasse"
    #[4,] "62138861_074" "traction"
    

    Sample data

    ref <- c(
        "62133130_074_traction",
        "62156438_074_ramasse",
        "62153874_070_ramasse",
        "62138861_074_traction")