I got the following situation:
...
preg_match('/#(.+?):(.+?)#/im','partA#partB#partC:partD#partE#partF',$matches);
...
after the execution $matches becomes
Array
(
[0] => #partB#partC:partD#
[1] => partB#partC
[2] => partD
)
Wouldn't it be normal for $matches[1]
to become partC
if I use the non-greedy wildcard ?
? Am I missing something?
I managed to solve it by using '/#([^#]+?):([^#]+?)#/im'
as the pattern, yet a pertinent explanation would be great to clear out the clouds.
Thanks.
It makes sense when you think about the underlying theory behind regular expressions.
A regular expression is what is known as a finite state automaton (FSA). What this means is that it will, in essence, process your string one character at a time from left to right, occasionally going backwards by "giving up" characters. In your example, the regex sees the first #
and, noting that the #
isn't participating in any other parts of the pattern, starts matching the next token (.+?
, in your case). It does that until it hits the colon, then matches the next token (again, .+?
). Since it's going left-to-right, it'll match up to the first hash, and then stop, because it's being lazy.
This is actually a common misconception - the ?
modifier for a quantifier isn't non-greedy, it's lazy. It'll match the minimum possible string, going left to right.
To fix your original regex, you could modify it like this:
/.+#(.+?):(.+?)#/im
What this would do is use a greedy match before the last hash before the colon, forcing the first capture group into only using the stuff between that hash and the colon. In the same vein, that group wouldn't need the lazy modifier either, yielding a final regex of:
/.+#(.+):(.+?)#/im