Search code examples
phpregexpcre

Repeat Capture Regex


I've been trying to get this Regex to work, and I feel like I almost have it, but I'm not sure how to get the results I desire. I am using a mock data structure that resembles a JSON object, and am trying to parse the parameters.

The structure resembles groups and options like such: group_label:id{option_1:id,option_2:id ... }

The expression I've come up with so far is

(?:(?:(?<group_name>[a-zA-Z0-9 _]+?):(?<group_id>[0-9]+?){(?:(?:(?<option_name>.+?):(?<option_id>.+?))+?,?)+?},?))+?

and the test data I'm using has been:

My Interests:379{Commercial:0,Consumer:1,Wholesale Reseller:2},Test Group:1234{Test One:1,Test 2:2}

Here is a link to the regex tester I'm looking at, you can see that each group turns into a match, but it only captures the last of each option, where I'd like to have a match for all the options as well.

https://regex101.com/r/GkW57Y/1

It also breaks if I try to specify the start and end of the string, so I'm sure that's a hint to me that something I'm doing is wrong, but I'm not a regex expert, and I'm running shorter on time. As always, any advice is always greatly appreciated!


Solution

  • Here's a regex that will extract the groups and options by just looking for the different features (groups end with {, options start with { or , and end with , or }):

    (?<group_name>[a-zA-Z0-9 _]+):(?<group_id>[0-9]+)(?={)|(?<=[{,])(?<option_name>[^:]+):(?<option_id>[^,}]+)(?=[,}])
    

    In PHP you can use it like this to get lists of groups and options:

    $string = 'My Interests:379{Commercial:0,Consumer:1,Wholesale Reseller:2},Test Group:1234{Test One:1,Test 2:2}';
    $regex = '(?<group_name>[a-zA-Z0-9 _]+):(?<group_id>[0-9]+)(?={)|(?<=[{,])(?<option_name>[^:]+):(?<option_id>[^,}]+)(?=[,}])';
    preg_match_all("/$regex/", $string, $matches);
    //print_r($matches);
    $groups = array_combine(array_filter($matches['group_name']), array_filter($matches['group_id'], function ($v) { return $v !== '';}));
    $options = array_combine(array_filter($matches['option_name']), array_filter($matches['option_id'], function ($v) { return $v !== '';}));
    print_r($groups);
    print_r($options);
    

    Output:

    Array (
        [My Interests] => 379
        [Test Group] => 1234
    )
    Array (
        [Commercial] => 0
        [Consumer] => 1
        [Wholesale Reseller] => 2
        [Test One] => 1
        [Test 2] => 2 
    )
    

    If you need a more structured output, you can do something like this after getting the matches:

    $output = array();
    for ($i = 0; $i < count($matches['group_name']); $i++) {
        if ($matches['group_name'][$i] != '') {
            // new group
            $this_group = $matches['group_name'][$i];
            $output[$this_group] = array('id' => $matches['group_id'][$i]);
        }
        else {
            // option for this group
            $output[$this_group]['options'][$matches['option_name'][$i]] = $matches['option_id'][$i];
        }
    }
    print_r($output);
    

    Output:

    Array (
        [My Interests] => Array (
            [id] => 379
            [options] => Array (
                [Commercial] => 0
                [Consumer] => 1
                [Wholesale Reseller] => 2
            )
        )
        [Test Group] => Array (
            [id] => 1234
            [options] => Array (
                [Test One] => 1
                [Test 2] => 2
             )
        ) 
    )
    

    Or possibly this might be more useful:

    $output = array();
    $this_group = -1;
    for ($i = 0; $i < count($matches['group_name']); $i++) {
        if ($matches['group_name'][$i] != '') {
            // new group
            $this_group++;
            $output[$this_group] = array('name' => $matches['group_name'][$i], 'id' => $matches['group_id'][$i]);
        }
        else {
            // option for this group
            $output[$this_group]['options'][$matches['option_name'][$i]] = $matches['option_id'][$i];
        }
    }
    print_r($output);
    

    Output:

    Array (
        [0] => Array (
            [name] => My Interests
            [id] => 379
            [options] => Array (
                [Commercial] => 0
                [Consumer] => 1
                [Wholesale Reseller] => 2
            )
        )
        [1] => Array (
            [name] => Test Group
            [id] => 1234
            [options] => Array (
                [Test One] => 1
                [Test 2] => 2
             )
        ) 
    )
    

    Demo on 3v4l.org