Search code examples
phpregexpreg-match-allcounting

Count occurrences of specific word after a different, specific word is found


I am rather new to regex and am stuck on the following where I try to use preg_match_all to count the number of hello after world.

If I use "world".+(hello), it counts to the in the last hello; "world".*?(hello) stops in the first hello, both giving one count.

blah blah blah
hello
blah blah blah
class="world" 
blah blah blah
hello 
blah blah
hello
blah blah blah
hello
blah blah blah

I am expecting 3 as the count because the hello before world should not be counted.


Solution

  • You can use a single preg_match_all call here:

    $text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
    echo preg_match_all('~(?:\G(?!^)|\bworld\b).*?\K\bhello\b~s', $text);
    

    See the regex demo and the PHP demo. Details:

    • (?:\G(?!^)|\bworld\b) - end of the previous match (\G(?!^) does this check: \G matches either start of the string or end of the previous match position, so we need to exclude the start of string position, and this is done with the (?!^) negative lookahead) or a whole word world
    • .*? - any zero or more chars, as few as possible
    • \K - discards all text matched so far
    • \bhello\b - a whole word hello.

    NOTE: If you do not need word boundary check, you may remove \b from the pattern.

    If hello and world are user-defined patterns, you must preg_quote them in the pattern:

    $start = "world";
    $find = "hello";
    $text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
    echo preg_match_all('~(?:\G(?!^)|' . preg_quote($start, '~') . '\b).*?\K' . preg_quote($find, '~') . '~s', $text);