I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe". I have tried with below code -
<?PHP
$units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
//mixed pattern
$pattern = '/(?J)(((?<i>^[a-zA-Z\s]+)(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s]+))/';
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m);
print_r($m);
$quantities = $m['q'];
$units = array_map('trim', $m['u']);
$ingrd = array_map('trim', $m['i']);
print_r($quantities);
print_r($units);
print_r($ingrd);
?>
The above code works for the string "2kg pohe", but not for the "pohe 2kg".
If anyone having idea what I am missing, please help me in this.
For pohe 2kg
duplicate named groups are empty, as the documentation of preg_match_all
states that for the flag PREG_PATTERN_ORDER (which is the default)
If the pattern contains duplicate named subpatterns, only the rightmost subpattern is stored in $matches[NAME].
Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe
but for the pohe 2kg
there is only a match in the first part so for the second part there are no values stored.
What you might do, is use the PREG_SET_ORDER
flag instead, which gives:
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => 2kg pohe
[i] => pohe
[1] =>
[q] => 2
[2] =>
[u] => kg
[3] =>
[4] => 2
[5] => kg
[6] => pohe
)
And
$ingredients = 'pohe 2kg';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => pohe 2kg
[i] => pohe
[1] => pohe
[q] => 2
[2] => 2
[u] => kg
[3] => kg
)
Then you can get the named subgroups for both strings like $m[0]['i']
etc..
Note that in the example there is 2Kg
and you can make the pattern case insensitive to match.