Search code examples
regexregex-lookarounds

REGEX capture each n-letters words between two words of a sentence


I'm having a hard time trying to select only n-length words between two words of a sentence : ex: for the statement : "this is the start some words are to be selected end no more select"

Let's say I'd like to select 3+ words between the word 'start' and 'end', the result would capture some, words, are selected ignoring to and be.

https://regex101.com/r/Ost7Wn/3

Just selecting [\w]{3,} is working by itself but I can't figure out how to put it between the words 'start' and 'end' in the sentence to match my n-letter words that appears only between them. I've tried many things, from lookaround to capture groups, but I really can't get it!

Any ideas ? Thanks


Solution

  • You may use this regex with a lookahead and \G:

    (?:\bSTART\b|(?!^)\G)\h+(?!END\b).*?\b(\w{3,})(?=.*?\bEND\b)
    

    RegEx Demo

    RegEx Details:

    • (?:\bSTART\b|(?!^)\G): Match word START or starting from end of previous match match 0 or more words separated by 1+ whitespace.
    • \G: asserts position at the end of the previous match or the start of the string for the first match
    • \h+(?!END\b).*?(\w{4,}): Match 1+ whitespace followed 0 or more characters followed by a word of 4+ length which is captured in group #1
    • (?=.*?\bEND\b): Lookahead to assert presence of word END ahead