Search code examples
rstrsplit

R Split string and keep substrings righthand of match?


How to do this stringsplit() in R? Stop splitting when no first names seperated by dashes remain. Keep right hand side substring as given in results.

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

# result: 
c("tim meyer XY900 123kncjd", "tom meyer XY900 123kncjd", "sepp moser VK123 456xyz", "max moser VK123 456xyz", "peter moser VK123 456xyz")

Solution

  • Here is one possibility using a few of the different base string functions.

    ## get the lengths of the output for each first name
    len <- lengths(gregexpr("/", sub(" .*", "", a), fixed = TRUE)) + 1L
    ## extract all the first names 
    ## using the fact that they all end at the first space character
    fn <- scan(text = a, sep = "/", what = "", comment.char = " ")
    ## paste them together
    paste0(fn, rep(regmatches(a, regexpr(" .*", a)), len))
    # [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
    # [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
    # [5] "peter moser VK123 456xyz"
    

    Addition: Here is a second possibility, using a little less code. Might be a little faster too.

    s <- strsplit(a, "\\/|( .*)")
    paste0(unlist(s), rep(regmatches(a, regexpr(" .*", a)), lengths(s)))
    # [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
    # [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
    # [5] "peter moser VK123 456xyz"