Search code examples
phpregexregex-group

Grouping of regex with same name


I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe". I have tried with below code -

<?PHP
    $units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
  
    
    //mixed pattern
    $pattern = '/(?J)(((?<i>^[a-zA-Z\s]+)(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s]+))/';

    
    $ingredients = '2kg pohe';
    
    preg_match_all($pattern, $ingredients, $m);
    print_r($m);
    $quantities = $m['q'];
    $units = array_map('trim', $m['u']);
    $ingrd = array_map('trim', $m['i']);
    print_r($quantities);
    print_r($units);
    print_r($ingrd);
?>

The above code works for the string "2kg pohe", but not for the "pohe 2kg".

If anyone having idea what I am missing, please help me in this.


Solution

  • For pohe 2kg duplicate named groups are empty, as the documentation of preg_match_all states that for the flag PREG_PATTERN_ORDER (which is the default)

    If the pattern contains duplicate named subpatterns, only the rightmost subpattern is stored in $matches[NAME].

    Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe but for the pohe 2kg there is only a match in the first part so for the second part there are no values stored.

    What you might do, is use the PREG_SET_ORDER flag instead, which gives:

    $ingredients = '2kg pohe';
    preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
    print_r($m[0]);
    

    Output

    Array
    (
        [0] => 2kg pohe
        [i] =>  pohe
        [1] => 
        [q] => 2
        [2] => 
        [u] => kg
        [3] => 
        [4] => 2
        [5] => kg
        [6] =>  pohe
    )
    

    And

    $ingredients = 'pohe 2kg';
    preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
    print_r($m[0]);
    

    Output

    Array
    (
        [0] => pohe 2kg
        [i] => pohe 
        [1] => pohe 
        [q] => 2
        [2] => 2
        [u] => kg
        [3] => kg
    )
    

    Then you can get the named subgroups for both strings like $m[0]['i'] etc..

    Note that in the example there is 2Kg and you can make the pattern case insensitive to match.