Search code examples
rregexstringparsinggsub

R - Extract info after nth occurrence of a character from the right of string


I've seen many iterations of extracting w/ gsub but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of -, matching everything between the 3rd and 4th occurrence.

For example:

string                       outcome
here-are-some-words-to-try   some
a-b-c-d-e-f-g-h-i            f

Here are a few references I've tried using:


Solution

  • You could use

    ([^-]+)(?:-[^-]+){3}$
    

    See a demo on regex101.com.


    In R this could be

    library(dplyr)
    library(stringr)
    df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)
    
    df <- df %>%
      mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
    df
    

    And yields

                          string outcome
    1 here-are-some-words-to-try    some
    2          a-b-c-d-e-f-g-h-i       f
    3            no dash in here    <NA>