Search code examples
rsplitcharactercriteria

Split character by multiple criteria in R


I have a vector like that:

c("variable1+variable2 + variable3*variable4+ variable5")

I would like to split his string in a vector like:

c("variable1", "variable2", "variable3", "variable4", "variable5")

IMPORTANT 1: note that there are two kind of separators; + and *. IMPORTANT 2: note that sometimes there are a blank space between the word I wanna get and the separator, and other times there are not blank spaces.


Solution

  • In base R, we can use strsplit

    out <- strsplit("variable1+variable2 + variable3*variable4+ variable5", 
              "\\s*[*+]\\s*")[[1]]
    

    -output

    out
    [1] "variable1" "variable2" "variable3" "variable4" "variable5"
    

    The structure is

    dput(out)
    c("variable1", "variable2", "variable3", "variable4", "variable5"
    )