I am rather new to regex and am stuck on the following where I try to use preg_match_all
to count the number of hello after world.
If I use "world".+(hello)
, it counts to the in the last hello; "world".*?(hello)
stops in the first hello, both giving one count.
blah blah blah
hello
blah blah blah
class="world"
blah blah blah
hello
blah blah
hello
blah blah blah
hello
blah blah blah
I am expecting 3
as the count because the hello
before world
should not be counted.
You can use a single preg_match_all
call here:
$text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
echo preg_match_all('~(?:\G(?!^)|\bworld\b).*?\K\bhello\b~s', $text);
See the regex demo and the PHP demo. Details:
(?:\G(?!^)|\bworld\b)
- end of the previous match (\G(?!^)
does this check: \G
matches either start of the string or end of the previous match position, so we need to exclude the start of string position, and this is done with the (?!^)
negative lookahead) or a whole word world
.*?
- any zero or more chars, as few as possible\K
- discards all text matched so far\bhello\b
- a whole word hello
.NOTE: If you do not need word boundary check, you may remove \b
from the pattern.
If hello
and world
are user-defined patterns, you must preg_quote
them in the pattern:
$start = "world";
$find = "hello";
$text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
echo preg_match_all('~(?:\G(?!^)|' . preg_quote($start, '~') . '\b).*?\K' . preg_quote($find, '~') . '~s', $text);