Search code examples
phpregexregex-group

PCRE regexp: multiple capture groups, repeated


I'd like to transform a string, containing a "simple" PHP array, into an array itself (without using eval()).

I ended up with the regexp below, which works fine but captures only the last "group" of 'attr' => 'val'. I get this can be solved by using non-capturing groups, but I wasn't able to adapt it to my needs.

^\[["'](?P<route>[\w\/]+)["'](,\s*["'](?P<paramname>[\w]+)["']\s*=>\s*["']?(?P<paramval>[\w]+)?["']?){0,}

Some patterns I'd like to match:

['conrtroller/action']
['conrtroller/action', 'param' => 1]
['conrtroller/action','param' => 1]
['conrtroller/action','param' => '1']
['conrtroller/action','param' => '1','param2' => 2]

All those works except the last one, that will return only 'param2' => 2.

Best would be to only return named groups, so I don't have to deal with skipping unnecessary items.

The final goal is to construct a PHP array by looping the found matches of preg_match or preg_match_all.

regex101


Solution

  • You can use the following expression with preg_match_all:

    (?:\G(?!^)|^\[["'](?P<route>[\w\/]+)["'])(?:,\s*["'](?P<paramname>\w+)["']\s*=>\s*["']?(?P<paramval>\w+)?["']?|)(?=.*?])
    

    See the regex demo.

    Details:

    • (?:\G(?!^)|^\[["'](?P<route>[\w\/]+)["']) - either the end of the previous successful match (\G(?!^)) or ^\[["'](?P<route>[\w\/]+)["']: start of string, [, ' or ", then one or more word or / chars captured into Group "route", and then a " or ' char
    • (?:,\s*["'](?P<paramname>\w+)["']\s*=>\s*["']?(?P<paramval>\w+)?["']?|) - a non-capturing group that either matches an empty string (see |) at the end) or a comma, zero or more whitespaces, " or ', one or more word chars captured into Group "paramname", " or ', a => enclosed with zero or more whitespaces, an optional ' or ", one or more word chars captured into an optional Group "paramval", and then an optional ' or "
    • (?=.*?]) - there must be a ] after any zero or more chars other than line break chars as few as possible.

    Here is a PHP demo:

    <?php
    
    $strs= ["['conrtroller/action']", "['conrtroller/action', 'param' => 1]", "['conrtroller/action','param' => 1]", "['conrtroller/action','param' => '1']", "['conrtroller/action','param' => '1','param2' => 2]"];
    foreach ($strs as $s) {
        if (preg_match_all('~(?:\G(?!^)|^\[["\'](?P<route>[\w\/]+)["\'])(?:,\s*["\'](?P<paramname>\w+)["\']\s*=>\s*["\']?(?P<paramval>\w+)?["\']?|)(?=.*?])~', $s, $matches, PREG_SET_ORDER, 0)) {
            //print_r($matches);
            foreach ($matches as $m) {
                if (!empty($m["route"])) { echo "---- New string match ---\n" . $m["route"] . PHP_EOL; }
                if (!empty($m["paramname"])) { echo "- " . $m["paramname"] . PHP_EOL; }
                if (!empty($m["paramval"])) { echo  "- " . $m["paramval"] . PHP_EOL; }
            }
        }
    }
    

    yielding

    ---- New string match ---
    conrtroller/action
    ---- New string match ---
    conrtroller/action
    - param
    - 1
    ---- New string match ---
    conrtroller/action
    - param
    - 1
    ---- New string match ---
    conrtroller/action
    - param
    - 1
    ---- New string match ---
    conrtroller/action
    - param
    - 1
    - param2
    - 2