I'm using capturing groups in regular expressions for the first time and I'm wondering what my problem is, as I assume that the regex engine looks through the string left-to-right.
I'm trying to convert an UpperCamelCase string into a hyphened-lowercase-string, so for example:
HelloWorldThisIsATest => hello-world-this-is-a-test
My precondition is an alphabetic string, so I don't need to worry about numbers or other characters. Here is what I tried:
mb_strtolower(preg_replace('/([A-Za-z])([A-Z])/', '$1-$2', "HelloWorldThisIsATest"));
The result:
hello-world-this-is-atest
This is almost what I want, except there should be a hyphen between a
and test
. I've already included A-Z
in my first capturing group so I would assume that the engine sees AT
and hyphenates that.
What am I doing wrong?
The Reason your Regex will Not Work: Overlapping Matches
sA
in IsATest
, allowing you to insert a -
between the s
and the A
-
between the A
and the T
, the regex would have to match AT
. A
is already matched as part of sA
. You cannot have overlapping matches in direct regex.Do it in Two Easy Lines
Here's the easy way to do it with regex:
$regex = '~(?<=[a-zA-Z])(?=[A-Z])~';
echo strtolower(preg_replace($regex,"-","HelloWorldThisIsATest"));
See the output at the bottom of the php demo:
Output:
hello-world-this-is-a-test
Will add explanation in a moment. :)
(?<=[a-zA-Z])
lookbehind asserts that what precedes the current position is a letter(?=[A-Z])
lookahead asserts that what follows the current position is an upper-case letter.-
, and convert the lot to lowercase.If you look carefully on this regex101 screen, you can see lines between the words, where the regex matches.
Reference