Search code examples
rtidyverseregex-lookaroundsstringr

Using lookbehinds to extract groups of strings in string_extract()


library(stringr)

I tried following the advice here but could not make it work for my problem. Using stringr I need to extract all the characters following the first string of letters plus a single underscore.

The following extracts exactly what I don't want

str_extract("mean_q4.8_addiction_critCount", "(^[a-z]*_)")

# [1] "mean_"

What I want is

# [1] "q4.8_addiction_critCount"

Based on the link I inserted above I tried a positive lookbehind

str_extract("mean_q4.8_addiction_critCount", "(?<=^[a-z]*_)\\w+")

But got the error

# Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
#  Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT)

And I couldn't work out how to constrain the maximum length.

Any advice much appreciated.


Solution

  • Can't you do the opposite instead? Remove everything until first underscore.

    sub('.*?_', '', 'mean_q4.8_addiction_critCount')
    #[1] "q4.8_addiction_critCount"
    

    As far as look-behind regex is concerned you can extract everything after first underscore ?

    stringr::str_extract("mean_q4.8_addiction_critCount", "(?<=_).*")