Search code examples
phpregexpreg-match-allpcremultiline

Regex multiline mode with optional group skip valid data


Consider next example:

$payload = '
ababaaabbb =%=
ababaaabbb =%=
ababaa     =%=
';

$pattern = '/^[ab]+\s*(?:=%=)?$/m';
preg_match_all($pattern, $payload, $matches);
var_dump($matches);

Expected and actual result of match is:

"ababaaabbb =%="
"ababaaabbb =%="
"ababaa     =%="

But if $payload changed to

$payload = '
ababaaabbb =%=
ababaaabbb =%=
ababaa     =%'; // "=" sign removed at EOL

actual result is

"ababaaabbb =%="
"ababaaabbb =%="

but expected is

"ababaaabbb =%="
"ababaaabbb =%="
"ababaa     "

Why this happen? Group (?:=%=)? is optional due to ? and last string in payload should be also present in match results.


Solution

  • Have a look at your current regex graph:

    enter image description here

    The =%= is optional (see how the branch between white space and End of line forks), but the EOL is required. That means after one or more a or b symbols, and zero or more whitespaces, EOL must occur. However, you have =% on your 3rd line => NO MATCH.

    Now, when you move the $ anchor into the optional group:

    enter image description here

    The end of line is now optional, too, and the match will be returned after matching 1+ a or b chars and optional whitespaces.